Next Article in Journal
Systemic Dynamics of Knowledge Sharing and Digital Transformation: Evidence from Bhutanese MSEs
Next Article in Special Issue
Group Stable Matching Problem in Freight Pooling Service of Vehicle–Cargo Matching Platform
Previous Article in Journal
Lagged Stance Interactions and Counter-Spiral of Silence: A Data-Driven Analysis and Agent-Based Modeling of Technical Public Opinion Events
Previous Article in Special Issue
Does the Adoption of Industrial Internet Platforms Expand or Reduce Geographical Distance to Customers? Evidence from China’s New Energy Vehicle Industry
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cross-Company Data Sharing Using Distributed Analytics

by
Soo-Yon Kim
1,*,
Stefanie Berninger
2,
Max Kocher
1,
Martin Perau
2 and
Sandra Geisler
1
1
Data Stream Management and Analysis, RWTH Aachen University, 52074 Aachen, Germany
2
Institute for Industrial Management at RWTH Aachen University, 52074 Aachen, Germany
*
Author to whom correspondence should be addressed.
Systems 2025, 13(6), 418; https://doi.org/10.3390/systems13060418
Submission received: 18 March 2025 / Revised: 18 May 2025 / Accepted: 22 May 2025 / Published: 29 May 2025
(This article belongs to the Special Issue New Trends in Sustainable Operations and Supply Chain Management)

Abstract

Decision making in modern supply chain management relies heavily on data-driven decision support. Companies show a growing interest in building insights not only on data from within the company’s own boundaries, but also from collaborators and other actors in the market. While the topic of data and information sharing has been the focus of previous works, there has been a lack of studies focusing on practical implementations in the supply chain domain. Our aim is to conduct a technical feasibility study of data sharing in supply chain management. We analyze the requirements for cross-company data sharing in supply chains, and discuss existing technologies that enable such collaboration. We apply a distributed analytics framework that has already been implemented in the healthcare domain to a simulated use case of key performance indicator (KPI) exchange between supply chain actors. We find that the application is able to compute and exchange KPIs from the simulated companies’ datasets without requiring centralization of the databases. Furthermore, we find that the framework supports integration of data quality assessment and privacy preservation mechanisms. The application thus yields promising results with regard to technical feasibility. Factors that may facilitate scalability are discussed as directions for future research.

1. Introduction

Data-driven decision making is highly relevant in modern supply chain management, particularly for addressing challenges such as disruptions as seen during the COVID-19 pandemic [1,2]. As the need for robust and sustainable decisions grows, companies’ interests emerge to build insights based not only on internal data, but also on data from external sources [3]. This development emphasizes the growing role of data sharing, the process of making data available to third parties, which can include companies, individuals, or public institutions [4].
Data sharing in the sense of cross-organizational collaboration has the potential to benefit organizations, as multiple parties can use the same data resource for various applications. McKinsey [5] highlighted the importance of collaboration and coordination throughout the supply chain as crucial strategies to address current and future challenges. The benefits of such data sharing in supply chain management are substantial [3,6,7]: it enables superior knowledge throughout the network, leading to faster and better-informed decision making. Companies may achieve reduced latencies and time-to-market while improving product quality. Beyond operational improvements, data sharing may foster novel types of collaborations within supply chains and enable new value-added services.
However, despite these recognized benefits, the implementation of data sharing systems remains challenging. Many organizations maintain data silos due to [8]:
  • Lack of mutual trust, understanding, and interaction;
  • Fear of losing control over sensitive data;
  • IT infrastructure limitations for external data exchange.
To address these challenges, implementing cross-company data sharing requires several key components [9]. First, legal frameworks need to be analyzed and governance structures developed to regulate data access and usage. Second, technologies need to ensure efficient and secure data transfer while maintaining data quality and transparency. Third, infrastructure and platforms must be established that facilitate data exchange.
While data sharing frameworks exist in various domains, supply chains present unique challenges that require specific solutions. These become evident when comparing different types of confidentiality across domains. For example, in healthcare, data sharing primarily concerns personal privacy rights of patients, whereas supply chain data involves business-sensitive information that directly affects competitive advantages and market positions [10,11]. While patient data privacy mandates the strict anonymization of individual patients, corporate privacy standards for sharing business-sensitive information are more flexible, varying according to the sensitivity of the specific information. Thus, a granular, controllable design of the data sharing privacy mechanisms for contextual adaptability is required. Different domains also operate under distinct incentive structures. In academic research, for instance, data sharing is driven by citation metrics and scientific recognition. In contrast, businesses require clear operational or financial benefits to justify sharing their proprietary information [11]. These fundamental differences create a more complex environment for establishing trust and governance mechanisms in supply chains, as the shared data directly impact business relationships and market dynamics.
To structure these complex data sharing environments, the concept of data ecosystems has evolved [12,13]. These frameworks help model and analyze data sharing scenarios by defining key entities and their interactions. Data ecosystems specify different types of actors (such as organizations or individuals), their roles in the ecosystem, the resources they handle (such as datasets or services), and their attributes. In addition, they define the relationships between these components, helping to understand data flows, access rights, and dependencies within the system.
Building on these concepts, initiatives such as the International Data Spaces (IDS) and Gaia-X provide comprehensive frameworks for data sharing [14,15]. They combine governance models with technical solutions— for instance, IDS defines both specific roles (data providers, consumers, brokers, and clearing houses) and technical components like the IDS connector for standardized data exchange [14]. These frameworks offer solid foundations for secure and standardized data sharing. However, while their application in specific domains is being explored1, there remains a gap between these general frameworks and practical implementations. The supply chain sector in particular requires detailed exploration of realistic application scenarios and the technical feasibility of data sharing in those use cases.
To bridge the gap between theoretical framework and implementation, our approach focuses on creating a concrete supply chain data sharing scenario while building on foundations of existing frameworks. Our implementation builds on distributed analytics, a method we selected for its focus on generating insights from distributed data sources, allowing us to start with practical use case development while progressively evaluating broader requirements for collaborative data analysis in the supply chain domain. One framework developed for the healthcare domain is the Platform for Analytics and Distributed Machine Learning for Enterprises (PADME)2, which builds on the Personal Health Train (PHT)3 concept [16,17]. PADME has found adoption in the healthcare domain, where it has already been applied to successfully demonstrate the derivation of insights from multiple data sources without centralizing the underlying data.
Our goal is to assess whether it is technically feasible for PADME to be applied to data sharing use cases in supply chains, to explore the applicability and adaptation needs of the approach to the specific requirements of the supply chain domain, and to derive insights on and develop functions for facilitating data sharing in supply chains in general. Our method combines a literature review for supply-chain-specific requirements for data sharing, the identification of a use case for data sharing in the supply chain domain, and a prototyping approach focused on implementing a proof-of-concept deployment of PADME to address the identified use case.
The remainder of this paper is structured as follows. In Section 2, we examine collaborative supply chain data sharing and introduce supplier ratings as a specific use case. Furthermore, we compile a requirements overview for cross-company data sharing and discuss the current state-of-the-art of technical solutions addressing those requirements, and identify a lack of existing practical implementation examples. In Section 3, we simulate a use case for a cross-company KPI exchange by synthesizing sample datasets to which we apply the PADME distributed analytics framework, detailing our technical implementation and extensions. Section 4 discusses our results regarding their performance and limitations, and proposes approaches to address these improvement potentials alongside further areas for future work.

2. Literature Review and Requirement Analysis

The following section first examines general tasks and issues in supply chain data analysis. Subsequently, the section reviews supply chain collaboration with a focus on data-driven decision support across company boundaries. We introduce the use case of exchanging supplier performance metrics to support supplier selection decisions. We further analyze the requirements for such systems and evaluate how current frameworks and technologies address these needs.

2.1. Data Analysis in Supply Chains

Supply chain data analytics span a wide range of tasks, including forecasting demand, optimizing production schedules, identifying bottlenecks, and monitoring operational performance [18]. Measuring figures such as supplier delivery reliability, production throughput, or inventory turnover plays a critical role in providing structured, quantitative insights into process performance. Internally, companies typically compute such metrics based on structured data extracted from operational databases such as enterprise resource planning systems or warehouse management systems. Transactional records, such as purchase orders, shipment confirmations, and production reports, often serve as the basis for these calculations. Even in internal systems, analytics efforts are often complicated by fragmented data sources, heterogeneous database schemas, inconsistent event recording practices, and variations in data granularity across departments [19]. As data analytics tasks are extended to an inter-organizational scale, additional complexities are introduced, which are discussed in the following.

2.2. Supply Chain Collaboration

Collaboration in supply chains occurs when two or more independent companies decide to work together to plan and execute supply chain operations more successfully than they could on their own [20]. This collaboration can typically have two primary forms [21], illustrated in Figure 1:
Vertical collaboration is established between entities at different levels of the supply chain, such as suppliers, manufacturers, distributors, and retailers. These partnerships aim to streamline processes, reduce delays, and optimize the product flow from production to customer delivery.
Horizontal collaboration is formed between organizations at the same supply chain level, including potential competitors or businesses within the same industry. Such collaboration typically focuses on expanding market reach through shared resources and capabilities, building reputation systems [22], and benchmarking [23].
Collaboration in supply chains can involve data sharing across organizational boundaries. When organizations exchange data, they can achieve goals such as improved services, enhanced operational efficiencies, and innovation in products or processes. Such data sharing activities can be driven by incentives including monetary compensation, reciprocal data sharing, or service benefits [24,25,26].
While supply chain collaboration can offer benefits, its implementation requires addressing complex challenges. Especially in horizontal collaboration, organizations must actively develop and maintain the balance of coopetition, a hybrid of cooperation and competition, where competing companies choose to collaborate for mutual benefits [27]. This requires careful consideration of data sharing mechanisms that protect individual interests while enabling collective advancement.
Given the multiple parameters and design choices in data sharing ecosystems, a use-case-driven approach enables focused analysis of requirements and challenges while allowing evaluation of transferability.

2.3. Use Case: Supplier Ratings

Selecting the right supplier is a difficult decision for many companies and organizations, with a possible significant impact on the ongoing performance of the organization [28].
In fact, supplier selection can be seen as one of the most crucial aspects of supply chain management. While academic literature on this topic is extensive, many industries still face challenges in implementing effective selection processes [29]. The evaluation typically considers multiple criteria, including price, flexibility, quality, delivery, and service. More recently, external factors such as political or environmental considerations have also gained importance [30]. Selecting appropriate suppliers ensures that goods and services are delivered in the right quantities, at the right price, and at the right time. The selection criteria are described by several key performance indicators (KPIs).
One KPI that could be computed using cross-company data sharing is the on-time delivery score of suppliers. This metric plays a significant role in supplier selection, as a supplier’s performance with regard to delivering on-time directly impacts a company’s operational efficiency [31]. For companies with limited or no experience with a particular supplier, evaluating performance metrics becomes particularly challenging. The ability to build this KPI from other companies’ experiences could provide valuable decision support, thus presenting a relevant use case.

2.4. Requirement Analysis and Current Technologies

To enable such collaborative scenarios, we examine the technical approaches and solutions proposed in the literature. While business and organizational aspects play an important role, we focus particularly on the technical feasibility and implementation of data sharing solutions. These range from fundamental privacy and security mechanisms to governance frameworks and concrete analytics solutions, each addressing distinct requirements of collaborative data sharing.
Privacy and security form the foundation for cross-organizational data sharing solutions [32]. At the most basic level, privacy is ensured through data anonymization techniques, while differential privacy provides formal guarantees for protecting records of individuals [33]. For security, solutions range from basic encryption to advanced approaches like secure multi-party computation, which offers mathematical guarantees for secure distributed computations. In industrial settings, trusted sensors and execution environments (TEEs) enable end-to-end security for distributed data processing [34]. These fundamental mechanisms establish the essential trust required for organizations to share sensitive data.
Building on these fundamentals, organizations need to establish trust and governance structures to enable sustainable data sharing partnerships and ensure mutual value creation. Blockchain technology can help build trust between organizations by providing transparent and verifiable records of all data transactions [35]. Governance frameworks define the rules and policies for data sharing—in federated models like Gaia-X, organizations collectively establish shared policies, data standards, and compliance rules [15].
For data sovereignty and control, organizations may employ smart contracts to automate access and usage policy enforcement [36]. Initiatives such as the International Data Spaces implement this through a decentralized architecture where data providers maintain control over the access to their data [37].
Interoperability standards enable consistent interpretation and seamless technical exchange of data across organizations [38]. Various domains have established specific standards—for instance, SNOMED4 in healthcare or ISO standards in manufacturing ensure that data maintain their meaning across organizational boundaries [39]. However, manufacturing supply chains lack comprehensive standards that cover all production parts and processes, especially for composite parts and complex assemblies. Initiatives like the Digital Product Passport have the potential to address these standardization challenges in supply chains [40]. There is a need to not only standardize the terminology, but also the exchange process. Data exchange protocols need to be established. For example, IDS implements a standardized protocol through specifically developed software components (“Connectors”) that provide API-based data exchange while integrating usage policies and access control at the technical interface level [14].
Data quality assessment is particularly challenging in supply chains, as quality requirements are heavily use-case dependent and differ across organizations—what constitutes high-quality data for one partner’s analytics might be insufficient for another’s process control. While generic data quality frameworks define metrics such as completeness, consistency, and timeliness, their concrete interpretation needs to be negotiated for each cross-company data sharing scenario [41].
Turning data into actionable insights requires data processing and applied analytics solutions. Solutions such as Apache Spark5 and Apache Flink6 address data processing on distributed systems and can handle large data transfers. In the area of applied analytics, traditional centralized approaches where data are pooled for analysis in a single location offer straightforward implementation and direct control over the analytics process. However, when data sovereignty or privacy concerns prevent centralization, distributed methods such as federated learning enable collaborative model training across organizations without raw data sharing, particularly for applications like image classification and prediction tasks [42]. Domain-specific implementations such as the Personal Health Train in the healthcare domain enable distributed analysis of medical datasets across multiple hospitals. The PHT follows a ‘train-to-data’ principle where analytics algorithms travel between participating hospitals, allowing them to collectively analyze patient data and derive medical insights while maintaining local data storage [17]. This approach uniquely ensures that no raw data are transferred; instead, it is the algorithm that is sent to the data, and only the results of the local computations are returned to the requester.
Table 1 summarizes the requirements and existing technologies.
The frameworks and approaches discussed above highlight the current landscape in enabling data sharing and analytics across organizational boundaries. Rather than attempting to develop a comprehensive theoretical framework that addresses all these aspects, we adopt a use case-driven approach where we apply a framework to the concrete scenario of collaborative supplier ratings in supply chains.
Among the analyzed approaches, distributed analytics—particularly the ‘train-to-data’ principle demonstrated by the PHT—offers a promising foundation for our use case-driven development. To evaluate this approach in practice, we build upon PADME, a framework that has already proven successful in healthcare settings. We have chosen this framework over other comparable solutions such as Vantage67, FAIR Data Train8, DataSHIELD9, and PySyft10 for its easy deployability. In Table 2, we compare the similarities and differences of the healthcare and supply chain domains, and find that the privacy and data protection requirements, the emphasis on stakeholder sovereignty, and the decentralized nature of data support the case for the transferability and alignment with the requirements of the supply chain domain of PADME.
To position our proposed approach within the broader landscape of distributed analytics methods, we compare PADME against federated learning and multi-party computation frameworks. The comparison analyzes the focus of each approach, its deployability, and its scope of analytics tasks, as summarized in Table 3.
PADME, federated learning, and multi-party computation represent fundamentally different approaches to distributed analytics.
PADME prioritizes data sovereignty by executing analytic tasks directly at data sources, allowing organizations to retain full control over their sensitive information. Its architecture, based on lightweight Dockerized deployments and flexible analytics scripts, offers high deployability with moderate technical effort. Given its design for decentralized data sources and its adaptability to various computation tasks, PADME aligns closely with the needs of supply chain collaboration.
Federated learning focuses on collaborative model training across decentralized datasets, aiming to produce a shared predictive model without moving raw data. Although it preserves a degree of privacy, it requires substantial technical setup, including compatible machine learning frameworks and ongoing synchronization of model updates. Its relevance to supply chain contexts is limited to specific use cases of predictive modeling tasks.
Multi-party computation emphasizes maximum privacy and security through cryptographic protocols, allowing joint computation without revealing private inputs. While offering strong theoretical privacy guarantees, multi-party computation imposes significant computational and communication overhead, reducing its deployability. Its rigid requirements for parties to jointly define functions and operators make it less suited for conducting a variety of analytics tasks.
In the following section, we detail how we employed PADME’s distributed analytics framework to enable collaborative supplier performance analysis across multiple organizations. This practical implementation allows us to evaluate both the technical feasibility and practical implications of the approach while continuously evaluating the implementation against the broader requirements for collaborative data analysis in supply chains.

3. Applying a Distributed Analytics Framework to a Supply Chain Use Case

For our practical implementation, we employed PADME [16] as a central component of our approach. PADME is designed to enable secure, decentralized analytics across multiple parties. Built on the Personal Health Train concept, it allows for the execution of complex analytics across distributed data sources without requiring the centralization of sensitive data.
PADME has been deployed in both the medical and social sciences domains [43,44]. In the medical field, the system has been implemented to allow researchers to access and analyze sensitive health data without compromising patient privacy and sovereignty, as demonstrated by its use in a study on federated learning for medical data analytics. In social sciences, PADME has been utilized to analyze sensitive, decentralized datasets for a sentiment analysis. These applications demonstrate PADME’s adaptability in cases where the generation of insights from multiple parties is needed while the sharing of data itself is restricted.
Building upon the framework’s capacity for decentralized analytics across distributed data sources, we apply PADME to a use case within the supply chain domain.

3.1. Use Case Setup

We selected the on-time delivery as our test case for information sharing through PADME. The on-time delivery is a KPI defined per supplier as [45]:
On - time delivery = Number of on - time goods receipt positions Number of goods receipt positions × 100 in %
Thus, for this score to be of value, it is required that the KPI is computed based on transactional delivery data.
The required data are sourced from the following relations in ERP systems [45]:
O r d e r P r o c e s s i n g ( O r d e r I D , O r d e r L i n e , I t e m I D , C o n f i r m e d D a t e , I t e m T y p e )
G o o d s R e c e i p t E n t r y ( G o o d s R e c e i p t I D , O r d e r I D , O r d e r L i n e , I t e m I D , E n t r y D a t e , I t e m T y p e )
Figure 2 and Figure 3 present examples of these datasets.
Using these data, the on-time delivery score is computed as follows. For the supplier in question, four orders were placed: two delivered on time and two delayed. Applying the formula, the score is calculated as:
On - time delivery = 2 4 = 50 % .
For our use case of on-time delivery exchange between companies, we chose a simulation-based approach to flexibly explore a simple setup and practically execute the entire pipeline with low implementation barriers, while still considering all potential challenges that may arise. We set up sample databases in PostgreSQL on two separate machines to represent two distinct manufacturing companies interested in sharing information on the on-time delivery rate of suppliers. Each database contained 200 entries with randomized delivery data and supplier identifiers, simulating synthetic yet realistic datasets. The aim was to calculate averaged on-time delivery scores of suppliers across participating companies, such that an aggregated score could be retrieved without centrally collecting individual scores. While the use case would benefit from, and the PADME framework theoretically supports, a large number of participating companies, we limited our setup to two entities to focus on establishing a proof of concept. This approach allows us to validate the core functionalities of the framework.
With the simulation setup established, in the following, we delve into the technical details of how the PADME framework operates, and how we integrate the databases of the simulated companies.

3.2. The PADME Architecture

Figure 4 illustrates the PADME architecture.
PADME is built on the PHT concept which enables distributed data analysis by allowing data to remain at the source, while analytic tasks are executed at the distributed sources, only transferring results.
PADME’s architecture is composed of three main components: Stations, Trains, and Central Services (CS) [43,46].
The Stations are independent, self-contained units that manage both data storage and the execution of analytic tasks. Each Station either stores data locally or provides access points for it, running containerized analytic models via Docker. After completing an analysis, the results are returned to the CS or the relevant repositories, ensuring that only results, and not the underlying data, are transferred during the process.
The Trains are computational workflows that travel between these Stations to perform analytics. These Trains carry predefined algorithms to be run by the Stations on the data, such as computations, transformations, or statistical analysis. They are defined and stored in Train Depot repositories and are orchestrated by the CS. Once the analysis is completed, the results can be written back to the Train which returns them to the requester, keeping the raw data within its originating Station.
The Central Services manage the coordination of tasks by defining the sequence of Stations and Trains involved in each analysis. They also manage the Train Depot repositories that store and allow access to the analytic models. Additionally, the CS includes monitoring tools that ensure transparency and accountability throughout the execution of tasks. For the Stations to function within the PADME framework, they must first register with the CS. This registration process enables the CS to manage and coordinate the tasks assigned to each Station.
In our simulation, the components of PADME’s architecture correspond to the following entities:
  • Stations: The two machines in our setup represent the Stations in the PADME framework. Each machine simulates a separate company, with its own local PostgreSQL database storing supplier delivery data. The two simulated companies (Stations) must be properly registered and recognized by the CS before they can execute the analytic tasks.
  • Train: The analytic task in our case is implemented as a Python script (3.10) that is containerized using Docker. This script, representing the Train, is transmitted to the respective Stations. Upon reaching a Station, the script can be inspected and executed. The Train’s primary function is to request access to the local database, perform the calculation of the on-time delivery score as defined earlier, and store the result to return it to the requester.
  • Central Services: The CS are an existing application implemented by the PADME developers and can be compared to a train operation service. These services coordinate the transmission of the containerized Python script (Train) between the Stations.
Requests to CS for a Train to roll can be initiated by any authorized user. While this flexibility is essential for operational efficiency, it also necessitates strict user access control and activity monitoring. Ensuring that only authorized users can submit requests helps maintain the integrity of the system and prevents unauthorized interference with the analytic processes or data exchanges. In our use case, the idea is for companies registering as participants to be both potential data providers as well as authorized data requesters.
To address this, PADME’s system integrates identity and access management mechanisms, such as Keycloak11, to control access to the platform, ensuring that only authorized users can interact with it.

3.3. Use Case Execution

Figure 5 marks the execution steps of PADME along its architecture. The process of on-time delivery score calculation is visualized as a flowchart in Figure 6.
At Step 1, a company initiates a request for an on-time delivery score calculation through CS. The process is illustrated in Figure 7. Upon receiving the request, the CS triggers the creation of a Train ride, stored within a Docker container, and pushes it to the next company on the line.
At Step 2, the company pulls the Docker image, where it can inspect the contents of the container before deciding whether to allow its execution, an example snippet of which can be seen in Figure 8. Should the company choose to reject the execution, the Docker image appends no results, and only appends any rejection reasons, if provided by the company. The train is pushed to the next Station.
If a company decides to allow the Train to execute its script, it is required to provide the necessary database connection details to the Docker container to enable the Train’s access to its local database. The Train then calculates the on-time delivery score based on the data available in the company’s database. Upon successful calculation, the resulting score is appended to a predefined results list, stored within the Docker container along with the script. The updated container is pushed back to CS.
At Step 3, once the ride is complete, the requester can pull the updated container, which now contains the appended on-time delivery scores, allowing the requester to access the results of the analysis performed on other companies’ data without transferring the underlying data itself. A sample results view is illustrated in Figure 9.
PADME is connected to a user interface through an Application Programming Interface (API) integration that enables interaction between the two systems. This integration allows users to trigger the execution of a Train through the interface. By using the API, the user interface can send requests directly to PADME’s CS. The interface abstracts the complexity of the underlying process, allowing users to specify the data, select the appropriate analysis, and initiate the execution with minimal technical knowledge. Upon a user’s action, the API sends the necessary parameters to CS, which deploys the corresponding Train to the designated Stations. The results are then retrieved and displayed back to the user, completing the workflow.

3.4. Data Privacy

Aside from the calculation and transmission of the KPI, we have explored how to extend the application with privacy-preserving functionalities. We focused on integrating mechanisms based on k-anonymity and l-diversity to ensure that the shared data are protected against re-identification and unauthorized inference. Firstly, k-anonymity is a privacy model that ensures that each record in a dataset is indistinguishable from at least k-1 other records with respect to certain identifying attributes [47]. This means that any individual data point cannot be uniquely identified based on the combination of its quasi-identifiers. Secondly, l-diversity is a further enhancement to k-anonymity, where the sensitive attributes in each group of k records must have at least l “well-represented” values, ensuring that the data do not expose sensitive information through homogeneity or skewness in sensitive values [48].
In the context of our use case, k-anonymity and l-diversity can be applied to the supplier data used to calculate the on-time delivery score. For example, consider a dataset containing supplier names, delivery times, and delivery statuses. If this dataset were shared between companies, it could potentially expose sensitive details such as the identity of the data provider, e.g., if it is known that the company had a partnership with the supplier in question at the given times. To apply k-anonymity, we could generalize or suppress certain quasi-identifiers (e.g., supplier names or delivery dates) such that each record is indistinguishable from at least k-1 other records. For instance, if the threshold for k-anonymity is set to 3, any given supplier’s record must be grouped with at least two other suppliers with the same or similar delivery characteristics, thereby preventing any data provider from being uniquely identified. From a technical perspective, this could involve aggregating supplier delivery data into broader categories, such as grouping suppliers by industry or geographical region, or binning delivery times into ranges (e.g., 0–5 days, 6–10 days) to reduce the granularity of identifiable data.
To ensure l-diversity, we would need to ensure that for each group of k records, the values for sensitive attributes like delivery status or delay reasons are sufficiently diverse. For example, if all records in a group of k suppliers have the same delivery delay reason, the group would not meet the l-diversity requirement unless there are multiple delay reasons present.

3.5. Data Quality Assessment

In parallel to our primary work with the PADME framework, we explored additional functionalities that could be integrated with such a system in future implementations.
One aspect of this exploration focused on developing a simple user interface for data management and quality assurance, which could play a key role in ensuring that data shared across companies is both accurate and reliable. We designed a prototype interface which includes a feature called Register Data Mode. In this mode, users can upload and review data before they are shared for collaborative analysis. The interface provides a series of checks to assess the quality of the data, which we call the data quality assistant. Figure 10 presents a simplified view of the interface.
The assistant performs several key functions:
  • First, it checks for data completeness, ensuring that all necessary attributes are present;
  • Second, it verifies the plausibility of the data, filtering out unrealistic or inconsistent entries;
  • Finally, it applies validity thresholds to ensure that only high-quality data are used for analysis.
This functionality demonstrates how data quality checks could complement the collaborative analysis process. Future work may focus on integrating such a quality assurance tool to create a more robust and reliable environment for data-driven decision making across organizations, and on enhancing the assistant to not only facilitate assessing standard data quality metrics, but also metrics that are of use-case-specific importance. This would require a more in-depth analysis of which data quality metrics are relevant for a use case.

4. Discussion

In this section, we provide a structured discussion of our research findings. We begin with a summary and discussion of the key contributions, followed by an analysis of the limitations of the current study, and conclude with directions for future research.

4.1. Key Contributions

Our work demonstrates the technical feasibility and practical potential of distributed analytics for cross-company data sharing in the supply chain domain. The following key contributions were achieved:
  • Domain-specific requirements: We identified and structured key technical and organizational requirements for supply chain data sharing.
  • Adapting PADME to supply chain scenarios: We compared the requirements of the supply chain domain to the healthcare domain, and transferred a suitable framework from the healthcare domain to a use case of calculating on-time delivery metrics across companies.
  • Enabling sovereignty-conscious KPI exchange: The system enables KPI computation without requiring centralization of sensitive data, preserving data sovereignty.
  • Integrating privacy and quality functions: We have explored the integration of privacy-preserving techniques (k-anonymity, l-diversity) and the implementation of a data quality assistant to support trust and data integrity.
These contributions collectively demonstrate that privacy-preserving, cross-company data analytics is technically feasible and practically relevant in supply chain contexts. By grounding the design in domain-specific requirements, the work moves beyond abstract architecture to address concrete challenges faced by companies, such as maintaining sovereignty over sensitive operational data while still participating in shared performance evaluation.
The successful adaptation of a framework from healthcare to supply chain analytics not only confirms the architectural flexibility of PADME, but also shows that common concerns around decentralization and trust span multiple domains. This cross-domain transfer strengthens the generalizability of algorithm-to-data approaches and suggests a broader applicability of existing technologies in industrial collaboration.
The use case of on-time delivery illustrates how KPI exchange across firms can improve supplier evaluation and benchmarking. This is particularly valuable for companies with limited transactional history with a supplier, as performance insights can be aggregated from peer organizations in a privacy-preserving manner. The modularity of the PADME framework further allows the definition and execution of custom analytics, enabling adaptation to other KPIs or operational contexts. Examples of such variations could be the calculation of KPIs like defect rate or lead time.
Our work offers a concrete proof-of-concept that connects technologies to practical supply chain needs, making a case for distributed analytics as a viable strategy for supply chain collaboration.

4.2. Limitations

While our implementation demonstrates the feasibility of distributed analytics in supply chain settings, several limitations remain that warrant further investigation:
  • Limited scale of evaluation: Our prototype setup involved only two simulated companies with synthetic data. While PADME theoretically supports larger networks, the evaluation of scalability in real-world, multi-actor supply chains was beyond the scope of this study.
  • Manual approval workflow: The current data sovereignty model requires each company to manually accept or reject each analytic request. While this supports strict data control, it introduces administrative overhead and may hinder responsiveness in time-sensitive scenarios.
  • Central Services as a single point of coordination: The need for participating Stations to register with the CS introduces a potential point of vulnerability. If compromised, the CS could expose metadata or enable unauthorized train requests, underlining the need for robust access and monitoring mechanisms.
  • Dependence on schema consistency: The analytic script assumes a predefined schema for database access. If a company uses alternative column names or data structures, the train will be unable to perform the intended computation.
  • Privacy–utility trade-off: While privacy-enhancing techniques such as k-anonymity and l-diversity reduce re-identification risks, they may limit the granularity or interpretability of results. Future research should explore whether meaningful analysis of performance trends remains possible under stricter privacy constraints.

4.3. Future Research Directions

Building on the current proof-of-concept implementation, several directions emerge for future research. A key priority lies in evaluating the scalability of the framework in real-world, large-scale supply chain networks. This includes addressing performance considerations such as processing efficiency on large datasets and reducing bottlenecks caused by centralized orchestration. One promising approach may involve restructuring the architecture into distributed subclusters that coordinate analysis locally and report to a higher-level control service [49].
In parallel, further development of the data sovereignty model is needed to reduce administrative overhead without compromising control. Automating access decisions based on predefined consent protocols, metadata, or train characteristics could streamline participation, especially in dynamic or high-frequency analytic scenarios. Integrating artificial intelligence into consent decision making may also enhance flexibility and user trust.
Another area for exploration would be schema interoperability. As the analytic scripts currently rely on predefined data formats, future work should consider onboarding support or vocabulary mapping services that align heterogeneous schemas across companies [13]. This could significantly improve the generalizability and ease of adoption across diverse supply chain contexts.
From a privacy perspective, future implementations may benefit from expanding the range of supported techniques, such as incorporating differential privacy or adaptive aggregation strategies. This could offer more flexible trade-offs between data protection and analytic utility, especially when evaluating more complex KPIs or time series data.
Finally, enabling real-world adoption at scale will require integrating technical solutions with business model and governance frameworks. This includes establishing legal agreements for data usage, providing compensation models for data contributions, and ensuring transparency in ecosystem participation [50]. Initiatives like IDS that combine technical connectors with contractual enforcement mechanisms represent a promising foundation in this regard [14].
Taken together, these research directions may help mature the proposed system into a scalable and widely applicable solution for cross-company data collaboration in supply chains.

Author Contributions

Conceptualization, S.-Y.K. and S.B.; methodology, S.-Y.K. and S.B.; software, S.-Y.K.; validation, S.-Y.K.; formal analysis, S.-Y.K. and S.B.; investigation, S.-Y.K. and S.B.; resources, S.-Y.K.; data curation, S.-Y.K.; writing—original draft preparation, S.-Y.K. and S.B.; writing—review and editing, M.K., M.P. and S.G.; visualization, S.-Y.K. and M.P.; supervision, M.P. and S.G.; project administration, S.-Y.K. and S.B. All authors have read and agreed to the published version of the manuscript.

Funding

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy—EXC-2023 Internet of Production—390621612.

Data Availability Statement

The datasets presented in this article are not readily available as the data are part of an ongoing study. Requests to access the datasets should be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
KPIKey Performance Indicator
PADMEPlatform for Analytics and Distributed Machine Learning for Enterprises
PHTPersonal Health Train
IDSInternational Data Spaces
TEEsTrusted Execution Environments
CSsCentral Services
APIApplication Programming Interface
AIArtificial Intelligence

Notes

1
2
https://padme-analytics.de/ (accessed on 18 March 2025)
3
4
https://www.snomed.org/ (accessed on 18 March 2025)
5
https://spark.apache.org (accessed on 18 March 2025)
6
https://flink.apache.org (accessed on 18 March 2025)
7
https://distributedlearning.ai/ (accessed on 18 March 2025)
8
https://specs.fairdatatrain.org/ (accessed on 18 March 2025)
9
https://datashield.org/ (accessed on 18 March 2025)
10
https://docs.openmined.org/en/latest/ (accessed on 18 March 2025)
11
https://www.keycloak.org/ (accessed on 18 March 2025)

References

  1. Meier, M.; Pinto, E. COVID-19 Supply Chain Disruptions. Eur. Econ. Rev. 2024, 162, 104674. [Google Scholar] [CrossRef]
  2. Cedillo-Campos, M.G.; González-Ramírez, R.G.; Mejía-Argueta, C.; González-Feliu, J. Special issue: Data-driven decision making in supply chains. Comput. Ind. Eng. 2020, 139, 106022. [Google Scholar] [CrossRef]
  3. Lotfi, Z.; Mukhtar, M.; Sahran, S.; Zadeh, A.T. Information Sharing in Supply Chain Management. Procedia Technol. 2013, 11, 298–304. [Google Scholar] [CrossRef]
  4. Linnartz, M.; Leckel, A. Data Sharing im Supply-Chain-Management. Z. Wirtsch. Fabr. 2020, 115, 563–566. [Google Scholar] [CrossRef]
  5. Alicke, K.; Rexhausen, D.; Seyfert, A. Supply Chain 4.0 in Consumer Goods; McKinsey & Company: Chicago, IL, USA, 2017; Volume 1, pp. 1–11. [Google Scholar]
  6. Lee, H.L.; Whang, S. Information sharing in a supply chain. Int. J. Manuf. Technol. Manag. 2000, 1, 79–93. [Google Scholar] [CrossRef]
  7. Zhou, H.; Benton, W. Supply chain practice and information sharing. J. Oper. Manag. 2007, 25, 1348–1365. [Google Scholar] [CrossRef]
  8. Patel, J. Bridging data silos using big data integration. Int. J. Database Manag. Syst. 2019, 11, 1–6. [Google Scholar] [CrossRef]
  9. Scerri, S.; Tuikka, T.; de Vallejo, I.L.; Curry, E. Common European Data Spaces: Challenges and Opportunities. In Data Spaces: Design, Deployment and Future Directions; Springer International Publishing: Cham, Switzerland, 2022; pp. 337–357. [Google Scholar] [CrossRef]
  10. Malin, B.; Goodman, K. Between access and privacy: Challenges in sharing health data. Yearb. Med. Inform. 2018, 27, 055–059. [Google Scholar] [CrossRef]
  11. Kumar, R.S.; Pugazhendhi, S. Information Sharing in Supply Chains: An Overview. Procedia Eng. 2012, 38, 2147–2154. [Google Scholar] [CrossRef]
  12. Oliveira, M.I.S.; Barros Lima, G.d.F.; Farias Lóscio, B. Investigations into data ecosystems: A systematic mapping study. Knowl. Inf. Syst. 2019, 61, 589–630. [Google Scholar] [CrossRef]
  13. Geisler, S.; Vidal, M.E.; Cappiello, C.; Lóscio, B.F.; Gal, A.; Jarke, M.; Lenzerini, M.; Missier, P.; Otto, B.; Paja, E.; et al. Knowledge-Driven Data Ecosystems Toward Data Transparency. J. Data Inf. Qual. 2021, 14, 1–12. [Google Scholar] [CrossRef]
  14. Otto, B.; Hompel, M.t.; Wrobel, S. International Data Spaces. In Digital Transformation; Neugebauer, R., Ed.; Springer: Berlin/Heidelberg, Germany, 2019; pp. 109–128. [Google Scholar] [CrossRef]
  15. Braud, A.; Fromentoux, G.; Radier, B.; Le Grand, O. The Road to European Digital Sovereignty with Gaia-X and IDSA. IEEE Netw. 2021, 35, 4–5. [Google Scholar] [CrossRef]
  16. Welten, S.; Mou, Y.; Neumann, L.; Jaberansary, M.; Yediel Ucer, Y.; Kirsten, T.; Decker, S.; Beyan, O. A Privacy-Preserving Distributed Analytics Platform for Health Care Data. Methods Inf. Med. 2022, 61, e1–e11. [Google Scholar] [CrossRef]
  17. Beyan, O.; Choudhury, A.; van Soest, J.; Kohlbacher, O.; Zimmermann, L.; Stenzhorn, H.; Karim, M.R.; Dumontier, M.; Decker, S.; da Silva Santos, L.O.B.; et al. Distributed Analytics on Sensitive Medical Data: The Personal Health Train. Data Intell. 2020, 2, 96–107. [Google Scholar] [CrossRef]
  18. Jain, A.D.S.; Mehta, I.; Mitra, J.; Agrawal, S. Application of big data in supply chain management. Mater. Today Proc. 2017, 4, 1106–1115. [Google Scholar] [CrossRef]
  19. Moktadir, M.A.; Ali, S.M.; Paul, S.K.; Shukla, N. Barriers to big data analytics in manufacturing supply chains: A case study from Bangladesh. Comput. Ind. Eng. 2019, 128, 1063–1075. [Google Scholar] [CrossRef]
  20. Saenz, M.J.; Ubaghs, E.; Cuevas, A.I. Vertical Collaboration and Horizontal Collaboration in Supply Chain. In Enabling Horizontal Collaboration Through Continuous Relational Learning; Springer International Publishing: Cham, Switzerland, 2014; pp. 7–10. [Google Scholar] [CrossRef]
  21. Björnfot, A.; Torjussen, L.; Erikshammar, J. Horizontal supply chain collaboration in Swedish and Norwegian sme networks. In Proceedings of the 19th Annual Conference of the International Group for Lean Construction 2011, IGLC 2011, Lima, Peru, 13–15 July 2011; pp. 340–349. [Google Scholar]
  22. Hemmrich, S.; Schäfer, J.; Hansmeier, P.; Beverungen, D. The Value of Reputation Systems in Business Contexts—A Qualitative Study Taking the View of Buyers. In Proceedings of the Hawaii International Conference on System Sciences 2024 (HICSS-57), Honolulu, HI, USA, 3–6 January 2024; Available online: https://scholarspace.manoa.hawaii.edu/items/0405ff3b-a6c5-4f57-8d91-0c05d23a37e9 (accessed on 21 May 2025).
  23. Pennekamp, J.; Lohmöller, J.; Vlad, E.; Loos, J.; Rodemann, N.; Sapel, P.; Fink, I.B.; Schmitz, S.; Hopmann, C.; Jarke, M.; et al. Designing Secure and Privacy-Preserving Information Systems for Industry Benchmarking. In Proceedings of the Advanced Information Systems Engineering; Indulska, M., Reinhartz-Berger, I., Cetina, C., Pastor, O., Eds.; Springer: Cham, Switerland, 2023; pp. 489–505. [Google Scholar] [CrossRef]
  24. Gelhaar, J.; Gürpinar, T.; Henke, M.; Otto, B. Towards a Taxonomy of Incentive Mechanisms for Data Sharing in Data Ecosystems. In Proceedings of the Pacific Asia Conference on Information Systems (PACIS), Virtual, 12–14 July 2021; p. 121. [Google Scholar]
  25. Yu, Z.; Yan, H.; Edwin Cheng, T. Benefits of information sharing with supply chain partnerships. Ind. Manag. Data Syst. 2001, 101, 114–121. [Google Scholar] [CrossRef]
  26. European Parliament. Boosting Data Sharing in the EU: What Are the Benefits? 2022. Available online: https://www.europarl.europa.eu/topics/en/article/20220331STO26411/boosting-data-sharing-in-the-eu-what-are-the-benefits (accessed on 9 December 2024).
  27. Paja, E.; Jarke, M.; Otto, B.; Piller, F.T. 4.3 The Business of Data Ecosystems. In Dagstuhl Seminar 19411 Social Agents for Teamwork and Group Interactions; Schloss Dagstuhl: Wadern, Germany, 2020. [Google Scholar] [CrossRef]
  28. Bhutta, M.; Huq, F. Supplier selection problem: A comparison of the total cost of ownership and analytic hierarchy process approaches. Supply Chain. Manag. Int. J. 2002, 7, 126–135. [Google Scholar] [CrossRef]
  29. van der Westhuizen, J.; Ntshingila, L. The effect of supplier selection, supplier development and information sharing on SME’s business performance in sedibeng. Int. J. Econ. Financ. Stud. 2020, 12, 153–167. [Google Scholar]
  30. Diabat, A.; Khodaverdi, R.; Olfat, L. An exploration of green supply chain practices and performances in an automotive industry. Int. J. Adv. Manuf. Technol. 2013, 68, 949–961. [Google Scholar] [CrossRef]
  31. Niemi, T.; Hameri, A.P.; Kolesnyk, P.; Appelqvist, P. What is the value of delivering on time? J. Adv. Manag. Res. 2020, 17, 473–503. [Google Scholar] [CrossRef]
  32. Hong, Y.; Vaidya, J.; Wang, S. A survey of privacy-aware supply chain collaboration: From theory to applications. J. Inf. Syst. 2014, 28, 243–268. [Google Scholar] [CrossRef]
  33. Dwork, C. Differential Privacy. In Proceedings of the Automata, Languages and Programming; Bugliesi, M., Preneel, B., Sassone, V., Wegener, I., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
  34. Pennekamp, J.; Alder, F.; Matzutt, R.; Muhlberg, J.T.; Piessens, F.; Wehrle, K. Secure End-to-End Sensing in Supply Chains. In Proceedings of the 2020 IEEE Conference on Communications and Network Security (CNS), Virtual, 29 June–1 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
  35. Schönle, D.; Wallis, K.; Stodt, J.; Reich, C.; Welte, D.; Sikora, A. Industry use cases on blockchain technology. In Industry Use Cases on Blockchain Technology Applications in IoT and the Financial Sector; IGI Global: Hershey, PA, USA, 2021; pp. 248–276. [Google Scholar] [CrossRef]
  36. John, K.; Kogan, L.; Saleh, F. Smart Contracts and Decentralized Finance. Annu. Rev. Financ. Econ. 2023, 15, 523–542. [Google Scholar] [CrossRef]
  37. Pettenpohl, H.; Spiekermann, M.; Both, J.R. International Data Spaces in a Nutshell. In Designing Data Spaces: The Ecosystem Approach to Competitive Advantage; Otto, B., ten Hompel, M., Wrobel, S., Eds.; Springer: Cham, Switzerland, 2022; pp. 29–40. [Google Scholar] [CrossRef]
  38. Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
  39. Otto, B.; Österle, H. Datenqualität—Eine Managementaufgabe. In Corporate Data Quality: Voraussetzung Erfolgreicher Geschäftsmodelle; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–44. [Google Scholar] [CrossRef]
  40. Koppelaar, R.H.E.M.; Pamidi, S.; Hajósi, E.; Herreras, L.; Leroy, P.; Jung, H.Y.; Concheso, A.; Daniel, R.; Francisco, F.B.; Parrado, C.; et al. A Digital Product Passport for Critical Raw Materials Reuse and Recycling. Sustainability 2023, 15, 1405. [Google Scholar] [CrossRef]
  41. Linnartz, M.; Kim, S.Y.; Perau, M.; Schröer, T.; Geisler, S.; Decker, S. Unternehmensübergreifendes Datenqualitätsmanagement: Entwicklung eines Rahmenwerks zur Analyse der Stammdatenqualität in Kunden-Lieferanten-Beziehungen. Z. Wirtsch. Fabr. 2022, 117, 851–855. [Google Scholar] [CrossRef]
  42. Li, L.; Fan, Y.; Tse, M.; Lin, K.Y. A review of applications in federated learning. Comput. Ind. Eng. 2020, 149, 106854. [Google Scholar] [CrossRef]
  43. Welten, S.; de Arruda Botelho Herr, M.; Hempel, L.; Hieber, D.; Placzek, P.; Graf, M.; Weber, S.; Neumann, L.; Jugl, M.; Tirpitz, L.; et al. A study on interoperability between two Personal Health Train infrastructures in leukodystrophy data analysis. Sci. Data 2024, 11, 663. [Google Scholar] [CrossRef]
  44. Boukhers, Z.; Bleier, A.; Yediel, Y.U.; Hienstorfer-Heitmann, M.; Jaberansary, M.; Welten, S.; Koumpis, A.; Beyan, O. PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences. In Proceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Santa Fe, NM, USA, 26–30 June 2023; pp. 251–252. [Google Scholar] [CrossRef]
  45. Logistik, T. Logistic Indicators for Procurement; Technical Report VDI 4400 Blatt 1; Verein Deutscher Ingenieure (VDI): Düsseldorf, Germany, 2001. [Google Scholar]
  46. Jugl, M.; Welten, S.; Mou, Y.; Yediel, Y.U.; Beyan, O.D.; Sax, U.; Kirsten, T. Privacy-Preserving Linkage of Distributed Datasets using the Personal Health Train. arXiv 2023, arXiv:2309.06171. [Google Scholar] [CrossRef]
  47. Sweeney, L. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 2002, 10, 557–570. [Google Scholar] [CrossRef]
  48. Machanavajjhala, A.; Kifer, D.; Gehrke, J.; Venkitasubramaniam, M. l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 2007, 1, 3. [Google Scholar] [CrossRef]
  49. Tirpitz, L. Towards FAIR Data Stream Processing Ecosystems. In Proceedings of the 18th ACM International Conference on Distributed and Event-Based Systems, Lyon, France, 25–28 June 2024; pp. 203–206. [Google Scholar] [CrossRef]
  50. Jarke, M. Data Sovereignty and the Internet of Production. In Proceedings of the Advanced Information Systems Engineering; Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V., Eds.; Springer: Cham, Switzerland, 2020; pp. 549–558. [Google Scholar] [CrossRef]
Figure 1. Illustration of vertical and horizontal collaboration types in supply chains.
Figure 1. Illustration of vertical and horizontal collaboration types in supply chains.
Systems 13 00418 g001
Figure 2. Sample OrderProcessing table.
Figure 2. Sample OrderProcessing table.
Systems 13 00418 g002
Figure 3. Sample GoodsReceiptEntry table.
Figure 3. Sample GoodsReceiptEntry table.
Systems 13 00418 g003
Figure 4. The PADME architecture.
Figure 4. The PADME architecture.
Systems 13 00418 g004
Figure 5. The process steps of PADME marked along its architecture.
Figure 5. The process steps of PADME marked along its architecture.
Systems 13 00418 g005
Figure 6. A detailed flowchart of the on-time delivery score calculation with PADME.
Figure 6. A detailed flowchart of the on-time delivery score calculation with PADME.
Systems 13 00418 g006
Figure 7. Step 1: Requesting a train for calculating a supplier’s on-time delivery score from external data.
Figure 7. Step 1: Requesting a train for calculating a supplier’s on-time delivery score from external data.
Systems 13 00418 g007
Figure 8. Step 2: Inspecting the script loaded on the Docker container.
Figure 8. Step 2: Inspecting the script loaded on the Docker container.
Systems 13 00418 g008
Figure 9. Step 3: Viewing the appended results file.
Figure 9. Step 3: Viewing the appended results file.
Systems 13 00418 g009
Figure 10. Simplified view of the data quality assistant.
Figure 10. Simplified view of the data quality assistant.
Systems 13 00418 g010
Table 1. Requirements for cross-company data sharing and current technologies to address them.
Table 1. Requirements for cross-company data sharing and current technologies to address them.
RequirementTechnologies
Data privacyData anonymization
Differential privacy
SecurityEncryption
Multi-party computation
Trusted execution environments
Trust and governanceBlockchain
Federated governance models
Data sovereignty and controlSmart contracts
Decentralized access control
InteroperabilityVocabularies and ontologies
Standardized data exchange protocols
Data qualityCollaborative data quality management
Actionable insightsCentralized analytics
Distributed analytics
Table 2. Comparison of the requirements in the healthcare and supply chain domain.
Table 2. Comparison of the requirements in the healthcare and supply chain domain.
DimensionHealthcare DomainSupply Chain Domain
Data privacyMandatory due to regulation (e.g., GDPR)Critical due to business confidentiality and competition
SecurityHigh sensitivity (e.g., patient data, GDPR-regulated)High sensitivity (e.g., trade secrets, supplier performance data)
Trust and governanceStrong legal regulationsEmerging governance frameworks (e.g., IDS, Gaia-X)
Data sovereignty and controlDistributed data across hospitals, individual ownership of patientsDistributed data and ownership across independent companies in the supply chain
InteroperabilityDifferent systems, medical terminologies (e.g., SNOMED)Heterogeneous systems, varying KPIs and taxonomies
Data qualityVaries across institutions; supported by standards like HL7/FHIInconsistent across firms; lacks standardized KPIs and formats
Actionable insightsResearch collaboration, improving public health outcomesOperational optimization and benchmarking
Table 3. Comparison of distributed analytics frameworks.
Table 3. Comparison of distributed analytics frameworks.
FrameworkFocusDeployabilityScope of Analytics Tasks
PADMEData sovereignty and decentralized analyticsHigh (Dockerized, moderate setup effort)High (suited for a high variety of computation tasks)
Federated learningCollaborative machine learning model trainingMedium (requires ML frameworks, model synchronization, substantial setup)Medium (suited specifically for predictive modeling tasks)
Multi-party computationStrong cryptographic security for joint computationLow (computationally intensive, complex integration)Low (strict requirements for defining functions)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, S.-Y.; Berninger, S.; Kocher, M.; Perau, M.; Geisler, S. Cross-Company Data Sharing Using Distributed Analytics. Systems 2025, 13, 418. https://doi.org/10.3390/systems13060418

AMA Style

Kim S-Y, Berninger S, Kocher M, Perau M, Geisler S. Cross-Company Data Sharing Using Distributed Analytics. Systems. 2025; 13(6):418. https://doi.org/10.3390/systems13060418

Chicago/Turabian Style

Kim, Soo-Yon, Stefanie Berninger, Max Kocher, Martin Perau, and Sandra Geisler. 2025. "Cross-Company Data Sharing Using Distributed Analytics" Systems 13, no. 6: 418. https://doi.org/10.3390/systems13060418

APA Style

Kim, S.-Y., Berninger, S., Kocher, M., Perau, M., & Geisler, S. (2025). Cross-Company Data Sharing Using Distributed Analytics. Systems, 13(6), 418. https://doi.org/10.3390/systems13060418

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop