A Secure CDM-Based Data Analysis Platform (SCAP) in Multi-Centered Distributed Setting

: Hospitals have their own database structures and maintain their data in a closed manner. For this reason, it is difﬁcult for researchers outside of institutions to access multi-center data. Therefore, if the data maintained by all hospitals follow a commonly shared format, researchers can analyze multi-center data using the same method. To safely analyze data using a common data model (CDM) in a distributed multi-center network environment, the objective of this study is to propose and implement the processes for distribution, executing the analysis codes, and returning the results. A secure CDM-based data analysis platform (SCAP) consists of a certiﬁcate authority (CA), authentication server (AS), code signer (CS), ticket-granting server (TGS), relaying server (RS), and service server (SS). The AS, CS, TGS, and RS form the central server group of the platform. An SS is stored on a hospital server as an agent for communication with the server group. We designed the functionalities and communication protocols among servers. To safely conduct the intended functions, the proposed protocol was implemented based on a cryptographic algorithm. An SCAP was developed as a web application running on this protocol. Users accessed the platform through a web-based interface.


Introduction
The importance of medical data has been emphasized over the past several decades, and numerous medical institutions around the world have built systems to accumulate medical records.Following this trend, electronic medical record (EMR) or electronic health record (EHR) systems have been actively adopted in local hospital networks [1,2].With the recent development of big data and artificial intelligence technologies, researchers are attempting to use large numbers of medical data for secondary purposes such as the research and development of patient-centered services.However, it is unreasonable to derive statistically meaningful results using only the accumulated data in a single hospital.To overcome this problem, studies have been proposed to analyze data collected from several medical institutions, including multi-center or multi-source datasets [3,4].
However, despite the apparent advantages of multi-center datasets, the following practical problems occur: (1) Hospitals maintain medical data inside the local network boundary of the institution for management and security reasons.Because this setting allows resources to be distributed over the network, it is necessary to consolidate medical data [3,5,6].(2) In general, hospitals have a database structure for storing medical data.Differences in data formats can be a stumbling block when merging and further analyzing medical data collected from institutions [7][8][9].(3) Transmitting data from the hospital network without proper security can violate patient privacy [10].(4) Even if medical data are transmitted through secure network channels among institutions, patients may not want personal information to be included in the transmission [11][12][13][14].To effectively analyze data collected from various hospitals, these issues should be resolved.
In order to resolve these problems, there has been an increasing number of attempts to build a data-driven research infrastructure based on the Common Data Model (CDM).A Medical CDM is a standardized format of clinical or observational data that differs from hospital to hospital.Therefore, based on the same CDM format, investigators do not need to write multiple analysis procedures to analyze data from multiple hospitals [15].Another advantage of CDM is that it makes it easier for the hospitals to share the same or similar infrastructure to store and analyze their data.
Currently, many consortiums in each country develop platforms to support multiinstitutional medical research based on the above characteristics [3,16,17].Although these platforms have improved the research environment and convenience, proper security should be considered, as they process medical data including patients' personal information.General security requirements for medical information system design and operation are stated in ISO/IEC 27799:2016 [18]: "information security policy", "organization of information security", "human resource security", "assert management", "access control", "cryptography", "physical and environmental security", "operational security", "communications security", "system acquisition", "development", and "maintenance", "supplier relationships", "information security incident management", "information security aspects of business continuity management", and "compliance".From a technical point of view, these requirements ensure that only authorized users perform authorized operations, and the integrity of the results is satisfied.The most important techniques to achieve this are user authentication [19,20] and digital signatures [21,22].These techniques are used as part of many modern security protocols.For example, transport layer security (TLS) [10] supports authentication between parties communicating with each other, data encryption, and integrity guarantee for communication security.Although this protocol protects communication in a general way, it does not provide the security specific to the business logic of the medical system, such as a home health care service, between the user and the server.This problem is exacerbated in a multi-institutional medical data research environment where medical data is distributed on a network.Kerberos [23] provides authentication services in such an environment, but only one server at a time proves a user's identity and does not provide security for data sent and received after that.Therefore, a specialized security protocol is required to design a system to support multi-institutional medical data-based research.
In this paper, we propose a secure CDM-based data analysis platform (SCAP) for securely analyzing CDM data from multiple hospitals in a distributed network environment.SCAP distributes and executes software analysis codes (AC) and returns the analysis results to external investigators to analyze the CDM data of all hospitals connected to the platform.While providing an automated CDM data analysis service, SCAP provides end-to-end authentication to users accessing the platform and the integrity of CDM data analysis codes.The biggest advantage of this platform is that the medical data kept by each hospital is not shared beyond the hospital for analysis.This feature reduces the risk of the patients' personal information being leaked, and at the same time it enables us to obtain analysis results from multiple institutions.To safely manage this entire process, SCAP has the following features: (1) a secure multi-center user authentication, (2) integrity assurance for AC delivered over the network, (3) running AC and returning analysis results, and (4) providing a platform user authentication method without an account database for each hospital.To satisfy these features, the platform includes six subcomponents: a certificate authority (CA), an authentication server (AS), a code signer (CS), a ticket-granting server (TGS), a relaying server (RS), and a service server (SS).Among these components, the AS, CS, TGS, and RS form the central server group of the SCAP and provide an interface allowing users to deliver AC to hospitals registered on the platform.The CA issues certificates for public key cryptography for all participants of the platform.Finally, the SS is installed in the internal network of each hospital and analyzes the CDM data using the AC delivered from the central server group and returns the analysis results.The contributions of this paper are as follows:

•
This paper proposes a process for safely analyzing medical CDM data in a multi-center distributed network environment.

•
This paper describes in detail the functions of the components constituting the A secure CDM-based data analysis platform (SCAP) and the communication between components of the SCAP.
The rest of this paper is organized as follows.Section 2 presents the existing research on addressing the multi-institutional medical data and authentication methods.Section 3 describes the proposed data analysis platform.Section 4 details the implementations for the proposed platform.Lastly, Section 5 discusses the limitations and the conclusion.

Related Work
This section introduces the studies and popularly used user authentication methods to deal with security concerns that may arise from using multi-institutional medical data.

Approaches on Multi-Centered Medial Data
In this section, we introduce the existing privacy-preserving methods from different perspectives.
When combining data from different sources, patient privacy should be considered.Zhang et al. [3] proposed a solution for collecting data distributed across different departments and conducting data mining on an Internet of Health Things environment.The solution they proposed used locality-sensitive hashing (LSH) to consolidate data collected from multiple locations without concerning patient privacy.Ranbaduge et al. [5] and Vatsalan et al. [6] proposed privacy-preserving record linkage methods that connect multiple databases using a hash technique.In addition, Raisaro et al. [24] developed a platform that allows hundreds of clinical sites to share data and securely deliver the data number to external investigators.The platform uses homomorphic cryptography to provide end-to-end confidentiality and differential privacy for patient identification.
Most machine learning techniques that are actively employed in data analysis concentrate the data in a central storage to train a model.Due to the nature of the medical data containing sensitive information, exporting the data to the outside the institutional network is subject to numerous restrictions by policy or by law.Joint techniques for the learning of artificial intelligence models in a distributed environment without exposing data outside the institution have been actively studied [25][26][27].Existing learning algorithms for artificial intelligence models collect the data necessary for learning in the central storage.By contrast, federated learning places the data in the locations where they are created and periodically aggregates only the learning (intermediate) states.Because the data are not exposed to the outside, federated learning is relatively free from privacy issues.Li et al. [28] suggested a method for learning a robust decision-making model in a distributed environment, although they did not explicitly mention any federated learning in their study.
Another obstacle is the data format.To extract the same information from data from multiple institutions, it is necessary to understand the database schema of all hospitals.Because most organizations treat database schema as confidential, not only is such an extraction extremely tedious, it is also difficult for external investigators to determine the data format of a particular hospital.To solve this problem, efforts have been made to unify the format of medical data in recent years.The common data model (CDM) is the most successful example of standardizing a medical data representation [7][8][9].The data custodian of the hospital converts the original medical data of the institution into CDM data through an extract-transform-load (ETL) [11,12].Most CDM specifications require anonymization to remove sensitive information contained in the original data during the ETL [13,14].By unifying the format of such complex medical data, researchers and data analysts can explore and analyze data from multiple hospitals in the same way.In general, analysts use scripts or programs written in arbitrary programming languages.Although a CDM brings about many benefits to data analysis, many medical institutions still maintain their data on the premises owing to legal issues.This operation policy allows medical data to be logically or physically distributed over the network.To improve the research environment, research platforms for multi-center medical data have been constructed [29,30].

Authentication Methods
Most modern applications implement procedures to verify the authenticity of users.In this section, we introduce communication schemes used for authentication.
The Json web token (JWT) securely transfers information between two parties in a lightweight and self-contained way using JSON objects [19].The terminology 'selfcontained' means that it has all the information needed for a claim in it.In other words, all the information required for authentication is included in the token.JWT consists of header, payload, and signature; each part is base64 encoded and separated by a dot.The header contains the algorithm for the signature generation.In the signature part, the digital signature for the payload, calculated by the algorithm specified in the header, is located.The payload holds the information to be contained in the token.This information is called claims and includes registered claims, public claims, and private claims as specified in the standard.JWT does not require a special storage because it contains all information necessary for user authentication in the token itself.However, it is difficult to set an appropriate token life cycle, and the more information the payload contains, the larger the token size.
Open authorization (OAuth) 2.0 is an open standard for delegating access of websites or applications to their information on other websites without the users providing passwords [20].OAuth 2.0 deals with both authentication and authorization.The users (resource owner) are authenticated by the application (client) with their credentials.The client uses these credentials to request an access token from the authorization server.Authorization server verifies the received credentials and issues an access token.The client presents the access token and obtains the resources from the resource server.OAuth 2.0 is not only adopted as a popular authentication and authorization method in modern web services, it can also be used very conveniently by a user.In addition, services share resources with each other and can be easily extended with more functionalities.However, OAuth 2.0 is used only on HTTPS, and the access tokens should be managed securely.Since this scheme supports many authentication methods, developers need to correctly understand the specification, so that setting up the operating environment correctly.
Lastly, Kerberos provides a ticket-based centralized authentication service [23].In an environment where services are distributed on a network, users typically should prove their identity for each service.The services should maintain a storage for the user's authentication information and so, it is vulnerable to security breaches.To resolve this inconvenience, Kerberos provides a simplified authentication mechanism using an authentication server (AS), a ticket-granting server (TGS), and a service server (V).The users prove their identity to the AS and receive a token.The users present the token and the ID of V to access to TGS, and receive a ticket.This ticket is presented when the users access V and is used to authenticate the users by TGS.If the users access another V, without repeating authentication from the beginning, the users obtain a new ticket by presenting the token of the AS and the ID of the new V. Owing to this simplified authentication process, Kerberos is often employed to implement single sign-on (SSO) functionality.However, Kerberos allows access to only one service at a time, although outside the scope of the protocol design.

Methods
In this chapter, we describe the operating environment of the SCAP, the roles of the subcomponents constructing the platform in detail.Inspired by Kerberos' convenient authentication protocol, we designed secure communication between each component of SCAP.

Operating Environment
Figure 1 shows the operating environment of the SCAP.This platform is built on a distributed network where hospitals store CDM data on their server, and these servers are connected to the network.In this figure, the central server group consists of the AS, CS, TGS, and RS.The hospital network consists of an SS and a database of CDM data.The CA is placed on the Internet and provides trust for all SCAP components with public key certificates.The SCAP uses the observational medical outcomes partnership CDM (OMOP-CDM) [7] as a medical CDM.OMOP-CDM is a database scheme defined by observational health data sciences and informatics (OHDSI).In general, researchers using OMOP-CDM analyze data using R language and software provided by OHDSI [31,32].All hospital networks are connected to a server group through the Internet.Each component of the SCAP and the process of analyzing the medical CDM data are as follows: (1) The users prove their identity to the platform.The AS is responsible for user authentication and issues a token to authenticated users.(2) The users obtain a digital signature for the analysis codes written to analyze medical CDM data held by hospitals through the CS.(3) The users receive a ticket from the TGS proving their identity for the SS of each hospital.(4) The users distribute the analysis codes to all hospitals connected to the platform through the RS and receive the analysis results.As a preliminary task for the safe operation these processes, all platform servers and users create their private/public keys and receive public key certificates from the CA.This certificate conforms to the X.509 version 3 [33] standard, which is currently the most widely applied.Platform participants can obtain certificates, such as generic electronic financial transactions.
In this section, we describe the network protocol and information transmitted between the components of the SCAP.In the results section, we present the detailed implementation of each component along with the algorithms used.

Communication between User and AS
Before the communication between a user and the AS, it is assumed that the user has been appropriately registered in the AS.This assumption implies that the user is also enrolled in the SCAP. Figure 2 shows the process by which the AS authenticates the user of the SCAP platform.A user who wants to use the platform sends a message MC with the user's identifier ID C , such as an ID/password or biometric, and a signature sig M C of M C to request proof of identity to the AS (flow 1 in Figure 2).The signature is transmitted along with the message to prevent tampering by attackers, whereas the message is transmitted over the network to verify the sender.Because all components of the SCAP always transmit a message-signature pair, we omit the description of signatures for convenience in the remainder of this paper.If the requestor is a legitimate user, the AS returns a message M AS including a cryptographically generated token, token, and its signature sig M AS to the user (flow 2 in Figure 2).This token is generated by encrypting the user's identity and expiration time using a symmetric key cryptographic algorithm, such as DES [34] or AES [35].Therefore, only participants who know the key used to create the token can verify the validity of the token.The AS does not share the key used to generate the token with anyone to prevent a third party.In addition, for users not registered on the platform or illegal requests, the AS delivers an authentication failure response to the user.The user cannot use any SCAP service.

User
AS Sequence diagram for communication between the user and AS.

Communication between User and CS
It is assumed that the user accessing the CS has passed the authentication of the AS. Figure 3 shows how a user receives a digital signature for the AC to analyze the CDM data.Before accessing the CS, the user creates an AC to analyze the CDM data of hospitals through the SCAP.An AC can be written in a variety of ways.Because the SCAP is implemented based on the OMOP-CDM, users can use software [15,36] provided by OHDSI.The AC is generally written in R language and contains all information for analyzing the CDM data, such as cohort definitions and SQL queries.Because the AC is used to analyze medical data, it should not be tampered with until it is delivered to each hospital over the network.A digital signature is used to guarantee data integrity and non-repudiation during transmission.To satisfy this requirement, the user requests the CS to create a digital signature for the AC.To this end, the user delivers a message M C−CS and its signature sig M C−CS (flow 1 in Figure 3).The message M C−CS includes the AC and the token issued from the AS in the previous section.The CS creates a signature sig AC for the AC using its private key and wraps it in message M CS−C .The CS then delivers M CS−C and its signature sig M CS−C in response to the user (flow 2 in Figure 3).Because this signature was created with the private key of the CS, no one except the CS can create the same signature.However, anyone can verify the digital signature to obtain the public key of the CS.If the user modifies a part of the AC or writes a completely new AC, the user requests a new signature to use the CS.

Communication between User and TGS
Users who have passed authentication with the AS and obtained a digital signature for the AC can use the TGS to obtain a ticket to access the SSs connected to the SCAP. Figure 4 shows the process of issuing a ticket for a user to access the CDM data of all hospitals in the SCAP through the TGS.To request the issuance of a ticket to the TGS to access the SS of the hospital connected to the SCAP, a message M C−TGS , which includes token and sig AC , along with its signature sig M C−TGS , is transmitted (flow 1 in Figure 4).It does not matter whether M C−TGS contains the AC, as long as it is proven that the generator of sig AC is the CS.The TGS delivers a message M TGS−C , including a cryptographically generated ticket and its signature sig M TGS−C to the platform users (flow 2 in Figure 4).Similar to the authentication token of the AS, this ticket is generated using a symmetric key cryptographic algorithm with information such as the user's ID and expiration time.However, unlike the AS, the key used to create the ticket is encrypted with the public key of each SS receiving the ticket and then delivered to the user along with the ticket.Therefore, the user acquires the same number of tickets as the number of SSs connected to the platform.The user cannot decrypt the verification key delivered to the ticket.That is, except for the SS, a third party who has obtained a ticket, including a user, cannot create or decrypt a ticket.The TGS is an essential component in the SCAP operating on a distributed network.Without the help of the TGS, if users want to analyze CDM data from multiple medical institutions, they should prove their identity to the administrator of each institution.Even in such an environment, each institution should operate a database that stores the identities of all users who want to use the data.The TGS creates a ticket that allows any medical institution to authenticate the user's identity, in an attempt to analyze the data and alleviate these obstacles.By verifying this ticket, the SS of each hospital can authenticate the user without a user account database.

Communication for Distributing AC
We implemented three modes for distributing the AC and analyzing the CDM data of SCAP-connected hospitals: full-automation mode, intervention mode, and hybrid mode.In this section, we describe these three modes.
Figure 5 shows the communication process between the user, RS, and SS of hospitals in full-automation mode.In this mode, as the name suggests, the entire process, from the distribution of the AC to the returning CDM data analysis results, is conducted automatically.It is assumed that the user has a token, the AC, sig AC , and a ticket in advance.First, the user transmits the message M C−RS with the above four pieces of information and the signature sig M C−RS to the RS of the central server group (flow 1 in Figure 5).The RS verifies the token in M C−RS to determine whether the AS has authenticated the user.If the user is verified, the RS repackages the remaining information except for the token in M RS−SS and sends this message with its signature to the SS (SS 1 -SS n ) (flow 2 in Figure 5).The SS checks the authenticity of the user and the AC using the ticket and sig AC , respectively.If the verification of both the user and AC is successful, the SS runs the AC and analyzes the CDM data of its hospital (flow 3 in Figure 5).The analysis result (AR) is then returned to the RS (flow 4 in Figure 5).Finally, the user downloads the ARs from the hospitals (flow 5 in Figure 5).In full-automation mode, users can analyze the hospital data without any intervention.However, in full-automation mode, the hospital (or data manager) cannot control the process by analyzing the data of the institution and exporting the results.Institutions may simply want to deny the user access to the data.Allowing data to be analyzed without appropriate controls can occasionally violate the security policy of an institution.To satisfy these requirements, the intervention mode requests the hospital custodian to execute the AC and return the AR.First, as in full-automation mode, the user transmits a token, the AC, sig AC , and a ticket to the RS (flow 1 in Figure 6).RS verifies the user's identity by validating the token.The RS distributes the AC, a token, and a ticket from the user to the SSs of the hospitals (flow 2 in Figure 6).The SS verifies the received ticket and digital signature to verify the user and integrity of the AC.The SS does not immediately execute the AC, but waits for approval from the data custodian of the hospital.The custodian confirms the AC delivered to the hospital and approves the execution (flow 3 in Figure 6).Upon approval, the SS runs the AC to analyze the data of the hospital CDM (flow 4 in Figure 6).If the custodian suspends or rejects the AC execution, the AC is not executed.Likewise, the SS does not immediately return the AR for the CDM data to the user, but rather waits for the export approval of the custodian.If the custodian approves its export (flow 5 in Figure 6), the user can obtain the AR (flow 6 in Figure 6).By contrast, if the custodian suspends or rejects its export, the user cannot access the AR even after the data analysis has been completed.Intervention mode provides hospital control over the data analysis.Hospitals can configure secure policies compared with full-automation mode.However, if there are many requests for analysis in intervention mode, some custodians might be burdened if they are responsible for all data analyses.To overcome these shortcomings, we implemented a hybrid mode, which is a combination of full automation and intervention modes.Hospitals registered on the platform each have a whitelist of users who can analyze the data of the institution without restriction.If the user is registered on a whitelist, the SS applies both full-automation and intervention modes to an unregistered user.Using hybrid mode, the hospital can flexibly and efficiently control the data analysis of the user.

Results
We developed the SCAP, a platform for analyzing CDM data in a distributed network environment based on the communication protocol described in the previous section.This section describes the implementation details of the SCAP.The source codes of the SCAP are available from the URLs listed in Table 1.Each of the subcomponents of the SCAP is implemented as a web application.Interfaces for communicating between subcomponents are implemented as representational state transfer (REST) APIs.By virtue of the REST API, each server can provide functionalities, including not just the SCAP, but also any applications that require a similar functionality.A user invokes an authenticate procedure with identity information ID C .This procedure creates an M C−AS wrapping ID C and its signature sig M C−AS , and requests user authentication to AS (line 4 in Algorithm 1 for flow 1 in Figure 2).When the AS receives an authentication request from the user, it calls the handle_authentication procedure using M C−AS and sig M C−AS .This procedure verifies sig M C−AS to determine the integrity of M C−AS and the sender (from lines 1-5 in Algorithm 2).If sig M C−AS fails, the AS returns an appropriate ERROR to the user.Otherwise, AS uses ID C to check whether the sender is a registered user on the SCAP platform (from lines 7-11 in Algorithm 2).If the user is not registered, the AS returns an ERROR.When a full verification is successfully completed, AS generates a token based on cryptography and returns M AS−C and its signature sig M AS−C to the user (line 16 in Algorithm 2 for flow 2 in Figure 2).When the AS returns a token or ERROR, the execution of the authenticate procedure resumes from line 4.The user verifies sig M AS−C to verify the integrity and source of M AS−C (from lines 6-9 in Algorithm 1).If M AS−C is not created by the correct AS or is corrupted during network communication, the authenticate procedure returns an ERROR.Otherwise, it obtains a token from M AS−C and returns it (line 12 in Algorithm 1).The user stores the acquired tokens safely during their local storage.

Implementation for CS
Algorithms 3 and 4 show algorithms for generating a digital signature for the AC written by a user between the user (Algorithm 3) and CS (Algorithm 4), as shown in Figure 3.To analyze the CDM data intended by the user, the AC should not be tampered with by attackers while being transmitted to the SS.In addition, the author of the AC should be proven in an end-to-end manner.
The user calls the request_to_sign_AC procedure with the AC and the token issued by the AS.This procedure requires the CS to issue a digital signature to guarantee the integrity of the AC.The procedure sends M C−CS with its signature sig M C−CS (line 4 in Algorithm 3 for flow 1 in Figure 3).The message M C−CS contains both the AC and a token.Upon receiving the request, the CS invokes the handle_signing_AC_request procedure.This procedure uses sig M C−CS to verify the integrity of the sender of M C−CS (from lines 2 to 5 in Algorithm 4).The CS extracts the token from M C−CS and checks the identity of the sender (from lines 7-11 in Algorithm 4).If any of these checks fail, the procedure returns an ERROR.Otherwise, the CS acquires AC from M C−CS and generates a digital signature sig AC (lines 13 and 14 in Algorithm 4).A hash algorithm, such as a secure hash algorithm (SHA) and a public key encryption algorithm (RSA [37] or ECDSA [38]) may be used to generate a digital signature.This procedure returns M CS−C , which wraps sig AC and its signature sig M CS−C , to the user (from lines 16 to 18 in Algorithm 4 for flow 2 in Figure 3).When a response is received from the CS, the request_to_sign_AC procedure is resumed from line 4.This procedure validates sig M CS−C to verify the message and sender (lines 6 to 9 in Algorithm 3).If the integrity stands, the procedure obtains and returns sig AC from M CS−C (line 12 in Algorithm 3).The user safely maintains the acquired sig AC .execute the AC, even if all verifications are successful.In general, AC execution is a timeconsuming task; therefore, instead of synchronizing all algorithms in Algorithms 7-9, we append the AC to the thread pool.If AC is successfully included in the thread pool, this procedure returns a SUCCESS message, and if it fails for any reason, it returns an ERROR immediately (from lines 19 to 24 in Algorithm 9).If the thread pool is full owing to many data analysis requests from other users, a scheduler suspends the execution AC until resources become available.Thus, the computational resources of the SS can be efficiently operated.The return of the handle_analyzing_CDM_data_request procedure is delivered to the user through the RS without hesitation.
Figure 7 shows the structure of the SS used to implement the algorithm of Algorithm 9. Unlike other components of SCAP, the SS provides various functions for analyzing medical data.(1) The SS places the AC into the queue.In general, the execution of the AC requires considerable computational resources; it is therefore impossible to execute all analysis requests from users in parallel.Thus, the SS limits the ACs that can be run at a time using the management pool for the ACs.(2) The SS can run the AC to analyze the CDM data of the institution.If all previous verifications are successful, the SS executes the AC using the R engine.Alternatively, if the AC is written in a programming language other than R, the appropriate execution engine is called.When the data analysis is complete, the SS stores the analysis results, AR, in a compressed file type, such as a zip file.(3) The SS may notify the user of the CDM data analysis status.The data analysis process is time consuming (typically several hours).In addition, there may be differences in the execution times of the AC for each institution.Because the user cannot wait for the data analysis of all hospitals to be completed, the user should periodically check the analysis status of each hospital.When the user requests the analysis status, the SS responds with a "Waiting for execution", "Running", or "Analysis complete" message.If the SS is in full-automation mode, the AC is immediately executed without going through the execution standby state.(4) The SS may notify the user of the AR export status.As in the AC execution, the SS receives instructions from the custodian regarding whether to provide the results of the CDM data analysis to the user.When the user requests the SS where AR replies are possible, the SS responds with either "available for download" or "waiting for approval".Users can only obtain the AR if they are "available for download".As with function 3, when the SS is in full-automation mode, the "waiting for export" state is not used.Although the SCAP has basic functions for analyzing CDM data in a distributed network environment, it has the following limitations.First, at this point, the SCAP does not have an access control mechanism for users.If each hospital operates the SS in intervention or hybrid mode, access to the data of the institution can be manually or automatically restricted to users.However, this is undesirable as an access control, and the user's access service should be controlled at the central server group level.Fortunately, the SCAP already uses the AS to authenticate users, and thus it is relatively easy to embed access control into the platform.Second, the SCAP allows CDM data from each hospital to be analyzed individually.However, because medical data show different statistical characteristics according to the geographical location, it is insufficient to analyze hospital data individually through big data analysis.Additional techniques, such as federated learning, should be considered to solve this problem.

Conclusions
This paper proposed the SCAP, a platform that facilitates an analysis in a distributed environment in which CDM data are stored in the internal network of a hospital.The SCAP consists of the CA, AS, CS, TGS, RS, and SS for safely protecting the data analysis process.We designed and implemented a communication protocol among the components.In addition, some limitations of the platform were discussed.At this point, the SCAP is at the prototype level.In the future, we plan to expand this platform with continuous research and development.

User CS 1 )Figure 3 .
Figure 3. Sequence diagram for the communication between the user and CS.

User TGS 1 )Figure 4 .
Figure 4. Sequence diagram for communication between the user and TGS.

Figure 5 .
Figure 5. Sequence diagram for distributing AC in full-automation mode.

6 )Figure 6 .
Figure 6.Sequence diagram for distributing AC in intervention mode.

4. 1 .Algorithm 1 : 4 M 6 sig_check 7 if sig_check failed then 8 2 : 3 if sig_check failed then 4 return ERROR 5 end 6 7 8 id_check ← verify an identifier 9 if id_check failed then 10 token ← generate token 14 M
Implementation of AS Algorithms 1 and 2 show the algorithms for issuing authentication tokens for users of the SCAP from Figure 2 (Algorithm 1) and the AS (Algorithm 2).Rather than operating individually, these algorithms fulfill their respective roles through synchronization.The authentication algorithm on the user side 1 Procedure authenticate(ID C ) 2 M C−AS ← make a message with ID C 3 sig M C−AS ← generate a signature for M C−AS AS−C , sig M AS−C ← request to AS for authentication with M C−AS ||sig M C−AS 5 ← verify sig M AS−C The authentication algorithm on the AS side 1 Procedure handle_authentication(M C−AS , sig M C−AS ) 2 sig_check ← verify sig M C−AS ID C ← get an identifier from M C−AS AS−C ← make a message with token 15 sig M AS−C ← generate a signature for M AS−C 16 return M AS−C ||sig M AS−C 17 end

Algorithm 3 : 2 M 4 M
The code signing algorithm on the user side 1 Procedure request_to_sign_AC(AC, token) C−CS ← make a message with AC, token 3 sig M C−CS ← generate a signature for M C−CS CS−C , sig M CS−C ← request to CS for signing AC with M C−CS ||sig M C−CS ← get a signature for AC from M CS−C 12 return sig AC

Algorithm 5 : 2 M 4 MAlgorithm 8 : 3 if sig_check failed then 4 13 AC 17 M
The ticket issuing algorithm on the user side 1 Procedure request_to_issue_ticket(token, sig AC ) C−TGS ← make a message with token and sig AC 3 sig M C−TGS ← generate a signature for M C−TGS TGS−C , sig M TGS−C ← request to TGS for issuing ticket with M C−TGS ||sig M C−TGS The AC distributing algorithm on the RS side 1 Procedure handle_distributing_AC_request(M C−RS , sig M C−RS ) 2 sig_check ← verify sig M C−RS ← get the analyzing codes from M C−RS 14 sig AC ← get the signature for AC from M C−RS 15 ticket ← get the ticket from M C−RS 16 RS−SS ← make a message with AC, sig AC , and ticket 18 sig M RS−SS ← generate a signature for M RS−SS 19 result ← distribute the analyzing codes with M RS−SS ||sig M RS−SS

Algorithm 9 :
The AC distributing on the SS side 1 Procedure handle_analyzing_CDM_data_request(M RS−SS , sig M RS−SS ) 20 if result success then// AC will be executed to analyze the CDM data by the thread scheduler and AR will be saved.