Cross-Company Data Sharing Using Distributed Analytics
Abstract
1. Introduction
- Lack of mutual trust, understanding, and interaction;
- Fear of losing control over sensitive data;
- IT infrastructure limitations for external data exchange.
2. Literature Review and Requirement Analysis
2.1. Data Analysis in Supply Chains
2.2. Supply Chain Collaboration
2.3. Use Case: Supplier Ratings
2.4. Requirement Analysis and Current Technologies
3. Applying a Distributed Analytics Framework to a Supply Chain Use Case
3.1. Use Case Setup
3.2. The PADME Architecture
- Stations: The two machines in our setup represent the Stations in the PADME framework. Each machine simulates a separate company, with its own local PostgreSQL database storing supplier delivery data. The two simulated companies (Stations) must be properly registered and recognized by the CS before they can execute the analytic tasks.
- Train: The analytic task in our case is implemented as a Python script (3.10) that is containerized using Docker. This script, representing the Train, is transmitted to the respective Stations. Upon reaching a Station, the script can be inspected and executed. The Train’s primary function is to request access to the local database, perform the calculation of the on-time delivery score as defined earlier, and store the result to return it to the requester.
- Central Services: The CS are an existing application implemented by the PADME developers and can be compared to a train operation service. These services coordinate the transmission of the containerized Python script (Train) between the Stations.
3.3. Use Case Execution
3.4. Data Privacy
3.5. Data Quality Assessment
- First, it checks for data completeness, ensuring that all necessary attributes are present;
- Second, it verifies the plausibility of the data, filtering out unrealistic or inconsistent entries;
- Finally, it applies validity thresholds to ensure that only high-quality data are used for analysis.
4. Discussion
4.1. Key Contributions
- Domain-specific requirements: We identified and structured key technical and organizational requirements for supply chain data sharing.
- Adapting PADME to supply chain scenarios: We compared the requirements of the supply chain domain to the healthcare domain, and transferred a suitable framework from the healthcare domain to a use case of calculating on-time delivery metrics across companies.
- Enabling sovereignty-conscious KPI exchange: The system enables KPI computation without requiring centralization of sensitive data, preserving data sovereignty.
- Integrating privacy and quality functions: We have explored the integration of privacy-preserving techniques (k-anonymity, l-diversity) and the implementation of a data quality assistant to support trust and data integrity.
4.2. Limitations
- Limited scale of evaluation: Our prototype setup involved only two simulated companies with synthetic data. While PADME theoretically supports larger networks, the evaluation of scalability in real-world, multi-actor supply chains was beyond the scope of this study.
- Manual approval workflow: The current data sovereignty model requires each company to manually accept or reject each analytic request. While this supports strict data control, it introduces administrative overhead and may hinder responsiveness in time-sensitive scenarios.
- Central Services as a single point of coordination: The need for participating Stations to register with the CS introduces a potential point of vulnerability. If compromised, the CS could expose metadata or enable unauthorized train requests, underlining the need for robust access and monitoring mechanisms.
- Dependence on schema consistency: The analytic script assumes a predefined schema for database access. If a company uses alternative column names or data structures, the train will be unable to perform the intended computation.
- Privacy–utility trade-off: While privacy-enhancing techniques such as k-anonymity and l-diversity reduce re-identification risks, they may limit the granularity or interpretability of results. Future research should explore whether meaningful analysis of performance trends remains possible under stricter privacy constraints.
4.3. Future Research Directions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
KPI | Key Performance Indicator |
PADME | Platform for Analytics and Distributed Machine Learning for Enterprises |
PHT | Personal Health Train |
IDS | International Data Spaces |
TEEs | Trusted Execution Environments |
CSs | Central Services |
API | Application Programming Interface |
AI | Artificial Intelligence |
1 | https://internationaldataspaces.org/adopt/data-spaces-radar/ (accessed on 18 March 2025) |
2 | https://padme-analytics.de/ (accessed on 18 March 2025) |
3 | https://www.go-fair.org/implementation-networks/overview/personal-health-train/ (accessed on 18 March 2025) |
4 | https://www.snomed.org/ (accessed on 18 March 2025) |
5 | https://spark.apache.org (accessed on 18 March 2025) |
6 | https://flink.apache.org (accessed on 18 March 2025) |
7 | https://distributedlearning.ai/ (accessed on 18 March 2025) |
8 | https://specs.fairdatatrain.org/ (accessed on 18 March 2025) |
9 | https://datashield.org/ (accessed on 18 March 2025) |
10 | https://docs.openmined.org/en/latest/ (accessed on 18 March 2025) |
11 | https://www.keycloak.org/ (accessed on 18 March 2025) |
References
- Meier, M.; Pinto, E. COVID-19 Supply Chain Disruptions. Eur. Econ. Rev. 2024, 162, 104674. [Google Scholar] [CrossRef]
- Cedillo-Campos, M.G.; González-Ramírez, R.G.; Mejía-Argueta, C.; González-Feliu, J. Special issue: Data-driven decision making in supply chains. Comput. Ind. Eng. 2020, 139, 106022. [Google Scholar] [CrossRef]
- Lotfi, Z.; Mukhtar, M.; Sahran, S.; Zadeh, A.T. Information Sharing in Supply Chain Management. Procedia Technol. 2013, 11, 298–304. [Google Scholar] [CrossRef]
- Linnartz, M.; Leckel, A. Data Sharing im Supply-Chain-Management. Z. Wirtsch. Fabr. 2020, 115, 563–566. [Google Scholar] [CrossRef]
- Alicke, K.; Rexhausen, D.; Seyfert, A. Supply Chain 4.0 in Consumer Goods; McKinsey & Company: Chicago, IL, USA, 2017; Volume 1, pp. 1–11. [Google Scholar]
- Lee, H.L.; Whang, S. Information sharing in a supply chain. Int. J. Manuf. Technol. Manag. 2000, 1, 79–93. [Google Scholar] [CrossRef]
- Zhou, H.; Benton, W. Supply chain practice and information sharing. J. Oper. Manag. 2007, 25, 1348–1365. [Google Scholar] [CrossRef]
- Patel, J. Bridging data silos using big data integration. Int. J. Database Manag. Syst. 2019, 11, 1–6. [Google Scholar] [CrossRef]
- Scerri, S.; Tuikka, T.; de Vallejo, I.L.; Curry, E. Common European Data Spaces: Challenges and Opportunities. In Data Spaces: Design, Deployment and Future Directions; Springer International Publishing: Cham, Switzerland, 2022; pp. 337–357. [Google Scholar] [CrossRef]
- Malin, B.; Goodman, K. Between access and privacy: Challenges in sharing health data. Yearb. Med. Inform. 2018, 27, 055–059. [Google Scholar] [CrossRef]
- Kumar, R.S.; Pugazhendhi, S. Information Sharing in Supply Chains: An Overview. Procedia Eng. 2012, 38, 2147–2154. [Google Scholar] [CrossRef]
- Oliveira, M.I.S.; Barros Lima, G.d.F.; Farias Lóscio, B. Investigations into data ecosystems: A systematic mapping study. Knowl. Inf. Syst. 2019, 61, 589–630. [Google Scholar] [CrossRef]
- Geisler, S.; Vidal, M.E.; Cappiello, C.; Lóscio, B.F.; Gal, A.; Jarke, M.; Lenzerini, M.; Missier, P.; Otto, B.; Paja, E.; et al. Knowledge-Driven Data Ecosystems Toward Data Transparency. J. Data Inf. Qual. 2021, 14, 1–12. [Google Scholar] [CrossRef]
- Otto, B.; Hompel, M.t.; Wrobel, S. International Data Spaces. In Digital Transformation; Neugebauer, R., Ed.; Springer: Berlin/Heidelberg, Germany, 2019; pp. 109–128. [Google Scholar] [CrossRef]
- Braud, A.; Fromentoux, G.; Radier, B.; Le Grand, O. The Road to European Digital Sovereignty with Gaia-X and IDSA. IEEE Netw. 2021, 35, 4–5. [Google Scholar] [CrossRef]
- Welten, S.; Mou, Y.; Neumann, L.; Jaberansary, M.; Yediel Ucer, Y.; Kirsten, T.; Decker, S.; Beyan, O. A Privacy-Preserving Distributed Analytics Platform for Health Care Data. Methods Inf. Med. 2022, 61, e1–e11. [Google Scholar] [CrossRef]
- Beyan, O.; Choudhury, A.; van Soest, J.; Kohlbacher, O.; Zimmermann, L.; Stenzhorn, H.; Karim, M.R.; Dumontier, M.; Decker, S.; da Silva Santos, L.O.B.; et al. Distributed Analytics on Sensitive Medical Data: The Personal Health Train. Data Intell. 2020, 2, 96–107. [Google Scholar] [CrossRef]
- Jain, A.D.S.; Mehta, I.; Mitra, J.; Agrawal, S. Application of big data in supply chain management. Mater. Today Proc. 2017, 4, 1106–1115. [Google Scholar] [CrossRef]
- Moktadir, M.A.; Ali, S.M.; Paul, S.K.; Shukla, N. Barriers to big data analytics in manufacturing supply chains: A case study from Bangladesh. Comput. Ind. Eng. 2019, 128, 1063–1075. [Google Scholar] [CrossRef]
- Saenz, M.J.; Ubaghs, E.; Cuevas, A.I. Vertical Collaboration and Horizontal Collaboration in Supply Chain. In Enabling Horizontal Collaboration Through Continuous Relational Learning; Springer International Publishing: Cham, Switzerland, 2014; pp. 7–10. [Google Scholar] [CrossRef]
- Björnfot, A.; Torjussen, L.; Erikshammar, J. Horizontal supply chain collaboration in Swedish and Norwegian sme networks. In Proceedings of the 19th Annual Conference of the International Group for Lean Construction 2011, IGLC 2011, Lima, Peru, 13–15 July 2011; pp. 340–349. [Google Scholar]
- Hemmrich, S.; Schäfer, J.; Hansmeier, P.; Beverungen, D. The Value of Reputation Systems in Business Contexts—A Qualitative Study Taking the View of Buyers. In Proceedings of the Hawaii International Conference on System Sciences 2024 (HICSS-57), Honolulu, HI, USA, 3–6 January 2024; Available online: https://scholarspace.manoa.hawaii.edu/items/0405ff3b-a6c5-4f57-8d91-0c05d23a37e9 (accessed on 21 May 2025).
- Pennekamp, J.; Lohmöller, J.; Vlad, E.; Loos, J.; Rodemann, N.; Sapel, P.; Fink, I.B.; Schmitz, S.; Hopmann, C.; Jarke, M.; et al. Designing Secure and Privacy-Preserving Information Systems for Industry Benchmarking. In Proceedings of the Advanced Information Systems Engineering; Indulska, M., Reinhartz-Berger, I., Cetina, C., Pastor, O., Eds.; Springer: Cham, Switerland, 2023; pp. 489–505. [Google Scholar] [CrossRef]
- Gelhaar, J.; Gürpinar, T.; Henke, M.; Otto, B. Towards a Taxonomy of Incentive Mechanisms for Data Sharing in Data Ecosystems. In Proceedings of the Pacific Asia Conference on Information Systems (PACIS), Virtual, 12–14 July 2021; p. 121. [Google Scholar]
- Yu, Z.; Yan, H.; Edwin Cheng, T. Benefits of information sharing with supply chain partnerships. Ind. Manag. Data Syst. 2001, 101, 114–121. [Google Scholar] [CrossRef]
- European Parliament. Boosting Data Sharing in the EU: What Are the Benefits? 2022. Available online: https://www.europarl.europa.eu/topics/en/article/20220331STO26411/boosting-data-sharing-in-the-eu-what-are-the-benefits (accessed on 9 December 2024).
- Paja, E.; Jarke, M.; Otto, B.; Piller, F.T. 4.3 The Business of Data Ecosystems. In Dagstuhl Seminar 19411 Social Agents for Teamwork and Group Interactions; Schloss Dagstuhl: Wadern, Germany, 2020. [Google Scholar] [CrossRef]
- Bhutta, M.; Huq, F. Supplier selection problem: A comparison of the total cost of ownership and analytic hierarchy process approaches. Supply Chain. Manag. Int. J. 2002, 7, 126–135. [Google Scholar] [CrossRef]
- van der Westhuizen, J.; Ntshingila, L. The effect of supplier selection, supplier development and information sharing on SME’s business performance in sedibeng. Int. J. Econ. Financ. Stud. 2020, 12, 153–167. [Google Scholar]
- Diabat, A.; Khodaverdi, R.; Olfat, L. An exploration of green supply chain practices and performances in an automotive industry. Int. J. Adv. Manuf. Technol. 2013, 68, 949–961. [Google Scholar] [CrossRef]
- Niemi, T.; Hameri, A.P.; Kolesnyk, P.; Appelqvist, P. What is the value of delivering on time? J. Adv. Manag. Res. 2020, 17, 473–503. [Google Scholar] [CrossRef]
- Hong, Y.; Vaidya, J.; Wang, S. A survey of privacy-aware supply chain collaboration: From theory to applications. J. Inf. Syst. 2014, 28, 243–268. [Google Scholar] [CrossRef]
- Dwork, C. Differential Privacy. In Proceedings of the Automata, Languages and Programming; Bugliesi, M., Preneel, B., Sassone, V., Wegener, I., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
- Pennekamp, J.; Alder, F.; Matzutt, R.; Muhlberg, J.T.; Piessens, F.; Wehrle, K. Secure End-to-End Sensing in Supply Chains. In Proceedings of the 2020 IEEE Conference on Communications and Network Security (CNS), Virtual, 29 June–1 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Schönle, D.; Wallis, K.; Stodt, J.; Reich, C.; Welte, D.; Sikora, A. Industry use cases on blockchain technology. In Industry Use Cases on Blockchain Technology Applications in IoT and the Financial Sector; IGI Global: Hershey, PA, USA, 2021; pp. 248–276. [Google Scholar] [CrossRef]
- John, K.; Kogan, L.; Saleh, F. Smart Contracts and Decentralized Finance. Annu. Rev. Financ. Econ. 2023, 15, 523–542. [Google Scholar] [CrossRef]
- Pettenpohl, H.; Spiekermann, M.; Both, J.R. International Data Spaces in a Nutshell. In Designing Data Spaces: The Ecosystem Approach to Competitive Advantage; Otto, B., ten Hompel, M., Wrobel, S., Eds.; Springer: Cham, Switzerland, 2022; pp. 29–40. [Google Scholar] [CrossRef]
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
- Otto, B.; Österle, H. Datenqualität—Eine Managementaufgabe. In Corporate Data Quality: Voraussetzung Erfolgreicher Geschäftsmodelle; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–44. [Google Scholar] [CrossRef]
- Koppelaar, R.H.E.M.; Pamidi, S.; Hajósi, E.; Herreras, L.; Leroy, P.; Jung, H.Y.; Concheso, A.; Daniel, R.; Francisco, F.B.; Parrado, C.; et al. A Digital Product Passport for Critical Raw Materials Reuse and Recycling. Sustainability 2023, 15, 1405. [Google Scholar] [CrossRef]
- Linnartz, M.; Kim, S.Y.; Perau, M.; Schröer, T.; Geisler, S.; Decker, S. Unternehmensübergreifendes Datenqualitätsmanagement: Entwicklung eines Rahmenwerks zur Analyse der Stammdatenqualität in Kunden-Lieferanten-Beziehungen. Z. Wirtsch. Fabr. 2022, 117, 851–855. [Google Scholar] [CrossRef]
- Li, L.; Fan, Y.; Tse, M.; Lin, K.Y. A review of applications in federated learning. Comput. Ind. Eng. 2020, 149, 106854. [Google Scholar] [CrossRef]
- Welten, S.; de Arruda Botelho Herr, M.; Hempel, L.; Hieber, D.; Placzek, P.; Graf, M.; Weber, S.; Neumann, L.; Jugl, M.; Tirpitz, L.; et al. A study on interoperability between two Personal Health Train infrastructures in leukodystrophy data analysis. Sci. Data 2024, 11, 663. [Google Scholar] [CrossRef]
- Boukhers, Z.; Bleier, A.; Yediel, Y.U.; Hienstorfer-Heitmann, M.; Jaberansary, M.; Welten, S.; Koumpis, A.; Beyan, O. PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences. In Proceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Santa Fe, NM, USA, 26–30 June 2023; pp. 251–252. [Google Scholar] [CrossRef]
- Logistik, T. Logistic Indicators for Procurement; Technical Report VDI 4400 Blatt 1; Verein Deutscher Ingenieure (VDI): Düsseldorf, Germany, 2001. [Google Scholar]
- Jugl, M.; Welten, S.; Mou, Y.; Yediel, Y.U.; Beyan, O.D.; Sax, U.; Kirsten, T. Privacy-Preserving Linkage of Distributed Datasets using the Personal Health Train. arXiv 2023, arXiv:2309.06171. [Google Scholar] [CrossRef]
- Sweeney, L. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 2002, 10, 557–570. [Google Scholar] [CrossRef]
- Machanavajjhala, A.; Kifer, D.; Gehrke, J.; Venkitasubramaniam, M. l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 2007, 1, 3. [Google Scholar] [CrossRef]
- Tirpitz, L. Towards FAIR Data Stream Processing Ecosystems. In Proceedings of the 18th ACM International Conference on Distributed and Event-Based Systems, Lyon, France, 25–28 June 2024; pp. 203–206. [Google Scholar] [CrossRef]
- Jarke, M. Data Sovereignty and the Internet of Production. In Proceedings of the Advanced Information Systems Engineering; Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V., Eds.; Springer: Cham, Switzerland, 2020; pp. 549–558. [Google Scholar] [CrossRef]
Requirement | Technologies |
---|---|
Data privacy | Data anonymization |
Differential privacy | |
Security | Encryption |
Multi-party computation | |
Trusted execution environments | |
Trust and governance | Blockchain |
Federated governance models | |
Data sovereignty and control | Smart contracts |
Decentralized access control | |
Interoperability | Vocabularies and ontologies |
Standardized data exchange protocols | |
Data quality | Collaborative data quality management |
Actionable insights | Centralized analytics |
Distributed analytics |
Dimension | Healthcare Domain | Supply Chain Domain |
---|---|---|
Data privacy | Mandatory due to regulation (e.g., GDPR) | Critical due to business confidentiality and competition |
Security | High sensitivity (e.g., patient data, GDPR-regulated) | High sensitivity (e.g., trade secrets, supplier performance data) |
Trust and governance | Strong legal regulations | Emerging governance frameworks (e.g., IDS, Gaia-X) |
Data sovereignty and control | Distributed data across hospitals, individual ownership of patients | Distributed data and ownership across independent companies in the supply chain |
Interoperability | Different systems, medical terminologies (e.g., SNOMED) | Heterogeneous systems, varying KPIs and taxonomies |
Data quality | Varies across institutions; supported by standards like HL7/FHI | Inconsistent across firms; lacks standardized KPIs and formats |
Actionable insights | Research collaboration, improving public health outcomes | Operational optimization and benchmarking |
Framework | Focus | Deployability | Scope of Analytics Tasks |
---|---|---|---|
PADME | Data sovereignty and decentralized analytics | High (Dockerized, moderate setup effort) | High (suited for a high variety of computation tasks) |
Federated learning | Collaborative machine learning model training | Medium (requires ML frameworks, model synchronization, substantial setup) | Medium (suited specifically for predictive modeling tasks) |
Multi-party computation | Strong cryptographic security for joint computation | Low (computationally intensive, complex integration) | Low (strict requirements for defining functions) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, S.-Y.; Berninger, S.; Kocher, M.; Perau, M.; Geisler, S. Cross-Company Data Sharing Using Distributed Analytics. Systems 2025, 13, 418. https://doi.org/10.3390/systems13060418
Kim S-Y, Berninger S, Kocher M, Perau M, Geisler S. Cross-Company Data Sharing Using Distributed Analytics. Systems. 2025; 13(6):418. https://doi.org/10.3390/systems13060418
Chicago/Turabian StyleKim, Soo-Yon, Stefanie Berninger, Max Kocher, Martin Perau, and Sandra Geisler. 2025. "Cross-Company Data Sharing Using Distributed Analytics" Systems 13, no. 6: 418. https://doi.org/10.3390/systems13060418
APA StyleKim, S.-Y., Berninger, S., Kocher, M., Perau, M., & Geisler, S. (2025). Cross-Company Data Sharing Using Distributed Analytics. Systems, 13(6), 418. https://doi.org/10.3390/systems13060418