Secure Delivery Scheme of Common Data Model for Decentralized Cloud Platforms

Featured Application: Application: The proposed Secure-Cloud Common Data Model (SC-CDM) system is designed as a blockchain-based platform using a distributed ledger. It provides reliable conﬁdentiality, security and expandability to enhance the usability of CDM, a data format that can facilitate data analysis. Abstract: The Common Data Model (CDM) is being used to deal with problems caused by the various electronic medical record structures in the distributed hospital information system. The concept of CDM is emerging as a collaborative method of exchanging data from each hospital in the same format and conducting various clinical studies based on shared data. The baseline of a CDM system is centralized with an infrastructure typically controlled by a single entity with full authority. The characteristics of this centralized system can pose serious security issues. Therefore, the proposed SC-CDM system is designed as a platform for distributed ledger and provides data with a high level of conﬁdentiality, security, and scalability. This framework provides a reference model that supports multiple channels, using secure CDM as an encryption method. The data conﬁdentiality of CDM is guaranteed by asymmetric and symmetric protocols. Delivering CDM is protected by a symmetric key signed by the CDM creator and maintains lightweight distributed ledger transactions on Inter Planetary File System (IPFS), which acts as a ﬁle share. To deliver an encrypted CDM on the SC-CDM platform, the CDM is encrypted with a block cipher by a random symmetric key and Initialization Vector (IV). The symmetric key protocol is used for the fast encryption of large-capacity data. The SC-CDM is implemented the repository with IPFS for storing the encrypted CDM, in which symmetric key, two hash values, and IV are shared through blockchain. Data conﬁdentiality of SC-CDM is guaranteed by only registered users accessing the data. In conclusion, the SC-CDM is the ﬁrst approach to demultiplexing with the data conﬁdentiality proof based on asymmetric key cryptography. We analyze and verify the security of SC-CDM by comparing qualitative factors and performance with existing CDM. Moreover, we adopt a byte-level processing method with encryption to ensure e ﬃ ciency while handling a large CDM.


Introduction
In recent years, medical services have shifted from treatment to prevention and management of diseases. The importance of using medical information is on the rise for efficient patient health management [1]. A Hospital Information System (HIS) manages Electronic Medical Record (EMR) that is a digital version of a patient chart containing patient's medical history. Each hospital has a different 1.
The SC-CDM is the first approach of demultiplexing with the data confidentiality proof of CDM based on asymmetric key cryptography. We analyze and prove the security of SC-CDM.

2.
Compared to previous solutions, we adopt a byte-level challenge with encryption to ensure efficiency while handling a large CDM. 3.
The comparison results show that our scheme has the higher performance compared to other schemes.

4.
By using X.509 asymmetric key for DID (Decentralized Identifier)-based identity management (signing and using the DID operating, autonomous generation, updating, and removal functions), the risk of personal information exposure would be reduced.

Distributed Ledger
A distributed ledger is a new trend in the field of information technology. A lot of financial and non-financial applications has made the worth of distributed ledger. Blockchain is the latest leading technology. It brings a solution for challenges like traceability, transparency, trust, and accountability [4,5]. Distributed ledgers are multi-purpose technology in the digital world that are specifically designed to be shared across network of multiple sites, geographies, or institutions.
A blockchain is a linear form of a distributed ledger composed of immutable blocks of data, with each block containing a list of transactions and a unique reference to its predecessor block [6]. It is commonly considered as a specialized form of a distributed database [7]. Blockchain offers the ability to create unique (sometimes scarce) digital assets. Blockchain can solve the problem of Single Point Failure (SPF) [8]. Bitcoin's presenter proposed a mechanism to ensure digital data rights management [9].
Blockchain has features that can be utilized for transparent distribution of CDM data. The CDM provider functions as a peer and performs sequencing and validity determination of transactions on the blockchain. It also guarantees high availability through a peer-to-peer (P2P)-based file sharing system.

Decentralized Identifier (DID)
Identifiers uniquely identify an item of datum and give the datum a unique name to express its characteristics. DID, which is unique worldwide, is a new type of identifier that can be operated with high availability and can be cryptographically verified [4]. DIDs also support the principles of self-sovereignty [10], that is, they allow the identity owner to create, manage, and discard identifiers that appears to be suitable. It supports the use of multiple simultaneous identifiers, thus meeting the privacy enhancing properties. CDM can increase the interoperability of medical data between hospitals, but it is required to provide a secure medical data access environment for medical data analysis by various researchers, including disease prevention.
Identity management for the CDM operation process of hospital medical and prescription data is classified into four types: isolated, centralized, federated, and distributed ID. The isolated type is a form in which the user's identity is managed for each service, and the user needs the identity registration (member registration) and verification (authentication) procedures for each service, and the identity management system is securely built and operated for user authentication and access control of the cloud CDM service. In order to do that, it is expensive. In terms of security system, cloud CDM users need to register for identity verification and store or manage identity verification information (ID/PW, etc.), separate from hospital ID.
The centralized type is a method of centrally managing the identities of trusted users, and the construction and operation efficiency of the identity management system is higher than that of the isolated type. Users can register their identities only in the centralized management system and use the SSO (Single Sign On) function to connect multiple services through one authentication. With this technical feature, the centralized type is suitable for providing multiple services through a single cloud CDM. However, when a failure occurs in the central management system, the entire service cannot be used, and the interoperability and scalability are limited due to the closed structure.
The federated type is a method in which different service providers form a trust relationship for the convenience of users, and jointly manage the user's identity [11]. The federated type is suitable for SSO support between the hospital and the cloud CDM network. However, federated ID management should be preceded by establishing a trust relationship between the hospital and the cloud CDM, and if identity management is concentrated on a specific service provider, it has the same limitations as the centralized type.
The distributed ID type is operated in a way that users create their own identification information and share it through DLT (Distributed Ledger Technology) [12], and use DLT to share information to identify each service provider. The convenience of identity management can be improved by eliminating the need to register verification information. In terms of security system, since the user's identity is not stored in the central server, DB, etc., there is little risk of ID leakage, and theft and has the advantage of ensuring the integrity of the identification information through DLT technology.

Blockchain
Blockchain technology is a technology that transparently records transaction history on a ledger that anyone can read and copies and stores it on multiple computers. Therefore, since this technique is based on individual transactions, there is no need for a central administrator. In addition, because multiple computers verify the records, it is difficult to forge or alter them by third parties. The most representative applications based on blockchain technology are smart contracts, virtual currencies such as bitcoin, etc. Among them, decentralization, a characteristic of bitcoin, facilitates the sharing of control between users and ID providers, allowing users to manage their identities [13].
The NEXTLEAP system was developed for the purpose of alternative decentralization and enhanced privacy protection. Built as a federated ID system, this system uses algebraic MAC-based blind signatures to enhance privacy protection. It was also verified by the stability of messages through distributed IDs for messages allowed between different servers [14]. The blockchain protocol Appl. Sci. 2020, 10, 7134 4 of 20 "Ouroboros Praos" was developed to demonstrate the stability of security by delaying transmission in a semi-synchronous setting for complete conversion [15].
The chainAchor system is introduced to utilize identities that can be provided for privacy protection purposes. This system can read/verify transactions on the authorized blockchain, create a read-only list of public keys of anonymous members, and perform simple lookups on entities on the blockchain [16]. It also provides a protocol for an identity management system built into Bitcoin. This facilitates the sharing of control between users and ID providers, and verifies the relationship between the rights and responsibilities of users and ID issuers [13].
In addition, various studies are being conducted to improve security issues and operate more efficient systems through blockchain. A new practical paradigm named Thunderella was proposed through the replication of the system state by combining the asynchronous path and the synchronous alternative path [17]. It also provides the first official analysis of Bitcoin's target (re)calculation capabilities in crypto settings. It extends the q-bounded motive model of the Bitcoin backbone protocol, which presented the properties of Bitcoin's basic blockchain data structure, and shows how to build a strong public transaction ledger on top of it [18]. In order to reinforce the security issues of the developed system, abstract blockchain protocols can be defined and appropriate security properties of these protocols can be identified. To this end, it meets Nakamoto's blockchain protocol and proves that these properties are sufficient for general applications [19]. For proof of security, Bitcoin has presented a ledger of (G) UC models such as Canetti, and a process has been proposed with the aim of achieving function [20]. A blockchain-based secure data sharing mechanism for VN (Vehicular Networks) is proposed. In order to efficiently manage service provisioning, we have introduced an edge service provider with regular nodes [21]. The dual decentralized network blockchain utilizes a licensed blockchain for increased processing speed and security, and allows the system to be used openly and without permission to application developers while dividing the system into multiple layers [22].
The public blockchain is based on a distributed ledger and is classified as a public ledger limited to P2P-based financial transactions in terms of opening and managing the transaction ledger [23]. A private blockchain to solve these limitations can be used to share CDM data by authorized users such as the medical sector. In addition, this technology method can provide functions as it becomes easier and safer to participate in data services by using digital identifiers [24].

Common Data Model (CDM)
In medical and pharmacy research, the management of sensitive medical information such as patient information and safety issues are emerging domestically and abroad, and the medical community is paying attention to the possibility of utilizing a highly objective and safe research through the use of a CDM. Therefore, the introduction of CDM is increasing in domestic and overseas hospitals. The analysis of a lot of real data utilizes the CDM system. For example, it facilitates analysis across data sources by standardizing terms for drug use, medical events and procedures, data structure, and data interpretation [25].
Electronic Health Records (EHR), which are widely used in hospitals, are rapidly ubiquitous in the medical field, but their use in biomedical and clinical research is limited due to interoperability issues and technical requirements. Therefore, software that directly interacts with previously constructed EHR data utilizing the extracted CDM data can make the data more accessible to users with limited computing expertise or domain knowledge, or EHR-based users, and accelerate related research. Moreover, we introduce PatientExploreR, a scalable application built on the R/Shiny framework that interfaces with a relational database of EHR data in Observational Medical Outcomes Partnership CDM (OMOP-CDM) format, which generates patient-level interactive dynamic reports and provides clinical data without programming [26].
Based on the EHR system, CDM has been expanded and built to enable the analysis and comparison of Adverse Drug Reactions (ADRs), whose data structures are integrated with other external organizations. It provides a basis for collaborative research, analysis, and comparison between institutions in which the same type of CDM has been built, and provides an environment in which the same research can be conducted simultaneously on a variety of data sources [27].
The use of standard Health Information Exchange (HIE) data is increasing due to operability. However, the integration of HIE data into the EHR system has not yet been studied. To ascertain the quality of these data, referral documents from the Health Level 7 (HL7) Clinical Document Architecture (CDA) are converted to a CDM, to facilitate HIE data availability for longitudinal data analysis and to identify data quality levels. The CDA document was used to define the mapping rules for CDA-CDM conversion, confirming that CDM conversion was possible [28]. In addition, it has succeeded in transforming EHR into a common data model for Observational Health Data Science and Informatics (OHDSI), and this method can provide various opportunities for researchers and data holders [29]. Data integration is an important task in healthcare information and has a huge impact on what you get from existing health information data. Research using the OMOP-CDM, a common data model used to unify the data structure of each institution, is continuously being conducted in several countries. In France, a feasibility assessment was conducted for the implementation of the national electronic health record, which standardizes data and facilitates data exchange, sharing, and storage, using a CDM. This is especially the case when data are collected through different heterogeneous systems. CDM also provides tools for data-quality assessment, model integration, visualization, and analysis. An extract-transform-load process was implemented to feed data from the French healthcare system to the OMOP-CDM, and, as a result, 17 vocabularies corresponding to the French context were added to the concept of the OMOP-CDM [30].

Blockchain Application in CDM Environment
Medical data is defined as personal sensitive information by law, so its use is limited. In order to use the various healthcare data that have been established, it is limited to de-identification of personal information data, or individual consent is required, resulting in restrictions on data linkage/utilization. In addition, for the safe use of medical data, problem-solving for security threats must be preceded.
Medical data have conflicting action plans to protect and utilize personal data. As the openness of public data is activated, related personal information leakage and re-identification cases are increasing, and data provision is limited due to issues such as information security and personal information protection. Due to the nature of medical data, the scope is only medical information, not non-medical data, and such data are differentiated from the existing traditional database. Rather than implementing high performance, it is necessary to develop a technology and build an environment that requires a high level of security, considering the characteristics of medical information. As a result, it is necessary to propose a more reinforced system that can solve the information security system problems such as medical data leakage and re-identification, and this is presented in this paper.
In this paper, we present a study on blockchain technology for handling authority and management of medical data and a data exchange model for data linkage between institutions. In the case of blockchain, it is possible to protect personal information and strengthen self-determination through secure data exchange or smart contract. As for the data-exchange model, there is a CDM method developed for the purpose of exchange and analysis of distributed and stored data. Research that applied blockchain technology to the CDM environment is in the early stage of research, and specific implementation cases have been rarely published.

The Baseline CDM Model
Data extracted from EMRs tend to be stored in different relational database schemas. Figure 1 illustrates the conventional concept of CDM and its operation scheme derived from several sources of EMR in hospitals. Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 20 In order to collect and integrate clinical data of multiple hospitals, it is required to solve the heterogeneity of data structure and format, differences in quality and quantity of data, technical limitations of interoperability, and security issues. CDM should support linking of common analysis codes for EMR resource linkage to support integrated data analysis of research institutions, without leaking sensitive personal information. Unitary CDM construction is an essential process for supporting a distributed research network. Observational Health Data Sciences and Informatics (OHDSI) are also pursuing the goal of developing open-source tools based on the OMOP-CDM and establishing a distributed research network [31]. OMOP participates in more than 160 institutions around the world and supports joint research using distributed research networks. OHDSI aims to build a system that can support interoperability in order to utilize open research results [32].

The Secure Operation Scheme for the SC-CDM
The SC-CDM aims to provide a distributed platform for CDM cloud, from transmission to utilization, including the workflow process communications for benefiting health professionals and patients.
The use of CDM format is important for efficient data exchange between organizations in the network, given the diversity of organizations in the SC-CDM. It maximizes the use of existing data resources. The CDM has been constructed and supplied by several providers, thus resulting in separated and disseminated its data by clinical researchers. Figure 2 illustrates the concept of the SC-CDM. As you seen, in Figure 2, the trust manager of hospital A signs a CDM that is access by hospital B. More specifically, the CDM in NodeA is signed by the private key (PrivateKeyA) corresponding to the public key (PublicKeyA) contained in NodeB. Then we represent it as CDMA → B. The signed CDM is verified with PublicKeyA by the trust manager of hospital B operated in NodeB. In order to collect and integrate clinical data of multiple hospitals, it is required to solve the heterogeneity of data structure and format, differences in quality and quantity of data, technical limitations of interoperability, and security issues. CDM should support linking of common analysis codes for EMR resource linkage to support integrated data analysis of research institutions, without leaking sensitive personal information. Unitary CDM construction is an essential process for supporting a distributed research network. Observational Health Data Sciences and Informatics (OHDSI) are also pursuing the goal of developing open-source tools based on the OMOP-CDM and establishing a distributed research network [31]. OMOP participates in more than 160 institutions around the world and supports joint research using distributed research networks. OHDSI aims to build a system that can support interoperability in order to utilize open research results [32].

The Secure Operation Scheme for the SC-CDM
The SC-CDM aims to provide a distributed platform for CDM cloud, from transmission to utilization, including the workflow process communications for benefiting health professionals and patients.
The use of CDM format is important for efficient data exchange between organizations in the network, given the diversity of organizations in the SC-CDM. It maximizes the use of existing data resources. The CDM has been constructed and supplied by several providers, thus resulting in separated and disseminated its data by clinical researchers. Figure 2 illustrates the concept of the SC-CDM. As you seen, in Figure 2, the trust manager of hospital A signs a CDM that is access by hospital B. More specifically, the CDM in Node A is signed by the private key (PrivateKey A ) corresponding to the public key (PublicKey A ) contained in Node B . Then we represent it as CDM A→B . The signed CDM is verified with PublicKey A by the trust manager of hospital B operated in Node B .
The proposed CDM reference model is constructed by several CDM providers and CDM consumers that consist of hospitals and research institutes. Using this reference model, clinical researchers separate and disseminate the CDM data. The following describes the details of the CDM reference model.

•
Cryptography can be used for protecting information, using a hash value to maintain management of large-capacity CDMs. • DID is applied to solve the risk of central ID management caused by SPF and privacy leakage problems.

•
Distributed ledger is used to provide data integrity and share information through CDM signature. The proposed CDM reference model is constructed by several CDM providers and CDM consumers that consist of hospitals and research institutes. Using this reference model, clinical researchers separate and disseminate the CDM data. The following describes the details of the CDM reference model.
• Cryptography can be used for protecting information, using a hash value to maintain management of large-capacity CDMs. • DID is applied to solve the risk of central ID management caused by SPF and privacy leakage problems. • Distributed ledger is used to provide data integrity and share information through CDM signature.

Applied Security and Privacy Method in SC-CDM
This section presents a method for handling the security and privacy concerns in the SC-CDM, including the model for secure access of CDM data. CDM is represented as a digital asset concerned in a blockchain. The participants of the blockchain are the stakeholders of CDM, which consists of a CDM consumer, a CDM provider, and a service broker. The service broker plays a role as a mediator that preserves a trustworthy delivery of CDM between the CDM consumer and CDM provider. In the CDM blockchain, the transaction can be explained as three operations related with CDM. To access the permissioned CDM blockchain (represented as a node in Figure 3), X.509 is used for defining the format of public key certificates. Node A represents a CDM provider, and Node B represents a CDM consumer. Two trust managers located in the service broker play role as agents of the CDM provider operated in Node A and the CDM consumer operated in Node B for delivering the CDM trustily (represented as CDMA → B in Figure 3).

Applied Security and Privacy Method in SC-CDM
This section presents a method for handling the security and privacy concerns in the SC-CDM, including the model for secure access of CDM data. CDM is represented as a digital asset concerned in a blockchain. The participants of the blockchain are the stakeholders of CDM, which consists of a CDM consumer, a CDM provider, and a service broker. The service broker plays a role as a mediator that preserves a trustworthy delivery of CDM between the CDM consumer and CDM provider. In the CDM blockchain, the transaction can be explained as three operations related with CDM. To access the permissioned CDM blockchain (represented as a node in Figure 3), X.509 is used for defining the format of public key certificates. Node A represents a CDM provider, and Node B represents a CDM consumer. Two trust managers located in the service broker play role as agents of the CDM provider operated in Node A and the CDM consumer operated in Node B for delivering the CDM trustily (represented as CDM A→B in Figure 3). When a certificate is signed by a trusted certificate authority in SC-CDM, the CDM consumer accesses the public key of the CDM provider. The SC-CDM can establish secure communications with the components and validate CDMA → B, which is digitally signed by the corresponding private key owned by the CDM provider. Table 1 shows the basic definition of the components of SC-cloud.  When a certificate is signed by a trusted certificate authority in SC-CDM, the CDM consumer accesses the public key of the CDM provider. The SC-CDM can establish secure communications with the components and validate CDM A→B , which is digitally signed by the corresponding private key owned by the CDM provider. Table 1 shows the basic definition of the components of SC-cloud.  Table 2 summarizes the qualitative comparison of the characteristics of the baseline CDM model and the proposed cloud CDM model, SC-CDM. In terms of the fault tolerance, data sharing, privacy, and data integrity, SC-CDM is an improvement over the baseline CDM model.

CDM Separate Block Encryption/Decryption Algorithm
Both an asymmetric key and a symmetric key are valid on the blockchain. The symmetric key is applied for CDM data encryption according to performance characteristics of encryption and decryption of Cloud CDM service and its application. The symmetric protocol is used to ensure data confidentiality with CDM data resilience. When the multi-channel method is used, a large amount of data transmission between devices is distributed to other channels, thus increasing the efficiency of communication and security and reducing the occurrence of information exposures problems that may occur during the data transmission process. Figure 4 shows how the encrypted CDM between Node A and Node B in SC-CDM is shared. In order for the CDM data owner to transmit CDM data, each of the original CDM data are divided into two pieces, and then the divided parts are encrypted, using a symmetric key and a random Initialization Vector (IV).
communication and security and reducing the occurrence of information exposures problems that may occur during the data transmission process. Figure 4 shows how the encrypted CDM between NodeA and NodeB in SC-CDM is shared. In order for the CDM data owner to transmit CDM data, each of the original CDM data are divided into two pieces, and then the divided parts are encrypted, using a symmetric key and a random Initialization Vector (IV). The encryption scheme is described by the key-generation algorithm (Gen), having a key-space (K), the encryption (Enc), and decryption (Dec) algorithms, with the data-space (D).
• Gen generates a random key (32 byte) and an IV (16 byte); • Enc/Dec is based on Advanced Encryption Standard (AES) and Cipher Block Chaining with IV; • D: CDM derived by EMR. It is used to ensure data confidentiality and integrity of SC-CDM. Every CDM has different key and IV to increase the security by exploring randomness. A blockchain is used for exchanging information of encryption and decryption of CDM between a CDM producer and a CDM consumer through transaction.
The transaction has four components: • IV; • Shared key (generated by symmetric key protocol); • Two hash values generated by SHA-256. The IV adds randomness at the start point of the encryption process of chained block encryption mod, and the random IV creates key file for each encrypted file. A cipher block is generated from encrypting a XOR output of the previous cipher block and forwards plaintext block.
In Cipher-Block Chaining (CBC) mode, the previous cipher block is given as input to the next encryption algorithm after XOR with original plaintext block [33]. If each key is only ever used to encrypt a single CDM, one can get away with using a fixed IV. This random IV ensures that each CDM encrypts differently, such that seeing multiple CDM encrypted with the same key does not give the attacker any more information than just seeing a single long message. The encryption scheme is described by the key-generation algorithm (Gen), having a key-space (K), the encryption (Enc), and decryption (Dec) algorithms, with the data-space (D).

•
Gen generates a random key (32 byte) and an IV (16 byte); • Enc/Dec is based on Advanced Encryption Standard (AES) and Cipher Block Chaining with IV; • D: CDM derived by EMR.
It is used to ensure data confidentiality and integrity of SC-CDM. Every CDM has different key and IV to increase the security by exploring randomness. A blockchain is used for exchanging information of encryption and decryption of CDM between a CDM producer and a CDM consumer through transaction.
The transaction has four components: Shared key (generated by symmetric key protocol); • Two hash values generated by SHA-256.
The IV adds randomness at the start point of the encryption process of chained block encryption mod, and the random IV creates key file for each encrypted file. A cipher block is generated from encrypting a XOR output of the previous cipher block and forwards plaintext block.
In Cipher-Block Chaining (CBC) mode, the previous cipher block is given as input to the next encryption algorithm after XOR with original plaintext block [33]. If each key is only ever used to encrypt a single CDM, one can get away with using a fixed IV. This random IV ensures that each CDM encrypts differently, such that seeing multiple CDM encrypted with the same key does not give the attacker any more information than just seeing a single long message.

Experimental Setup
In this experiment, we built a prototype called SC-Blockchain to evaluate the designed the SC-CDM, which runs on Linux operating system (Cent OS 7.x, IPFS 0.42). We employed a lightweight blockchain/SC-Blockchain, which shows advanced performance on timestamping data with limitation, allowing very small amounts of data to be handled at a given time. Since the hash was added to the blockchain at the time of upload to the IPFS network, the signed document was timestamped in a tamperproof way. IPFS, a web protocol for data decentralization, uses content-addressing to uniquely identify each file in a global namespace connecting all devices. For IPFS to provide tamper-proof content retrieval, it uses hash points as its contents, which cannot be altered in any way. SC-Blockchain is also used for the management of the distributed ledger, persisting a separate storage structure in IPFS.
To deliver a request for CDM from clients to SC-CDM, an agent instead of the client needs to interact with tasks via a message broker. The SC-CDM participants generate an ID based on X.509 and store its public key in SC-Blockchain. The process of requesting and responding as well as delivering CDM data is done through a separate channel. The CDM processing related with requesting and responding is performed through a trust channel SC-Blockchain. The user's CDM signing and verification is configured to be performed through celery for asynchronous queue processing. Message queues are processed using REDIS, which supports NoSQL to improve performance.

Construction of Storage Structure for SC-CDM
IPFS is a P2P distributed storage system for storing large-capacity files or data [34]. By storing the data in an IPFS network file system, the IPFS hash for that data is obtained. Figure 5 shows the network bandwidth of IPFS in SC-CDM.
In this experiment, we built a prototype called SC-Blockchain to evaluate the designed the SC-CDM, which runs on Linux operating system (Cent OS 7.x, IPFS 0.42). We employed a lightweight blockchain/SC-Blockchain, which shows advanced performance on timestamping data with limitation, allowing very small amounts of data to be handled at a given time. Since the hash was added to the blockchain at the time of upload to the IPFS network, the signed document was timestamped in a tamperproof way. IPFS, a web protocol for data decentralization, uses contentaddressing to uniquely identify each file in a global namespace connecting all devices. For IPFS to provide tamper-proof content retrieval, it uses hash points as its contents, which cannot be altered in any way. SC-Blockchain is also used for the management of the distributed ledger, persisting a separate storage structure in IPFS.
To deliver a request for CDM from clients to SC-CDM, an agent instead of the client needs to interact with tasks via a message broker. The SC-CDM participants generate an ID based on X.509 and store its public key in SC-Blockchain. The process of requesting and responding as well as delivering CDM data is done through a separate channel. The CDM processing related with requesting and responding is performed through a trust channel SC-Blockchain. The user's CDM signing and verification is configured to be performed through celery for asynchronous queue processing. Message queues are processed using REDIS, which supports NoSQL to improve performance.

Construction of Storage Structure for SC-CDM
IPFS is a P2P distributed storage system for storing large-capacity files or data [34]. By storing the data in an IPFS network file system, the IPFS hash for that data is obtained. Figure 5 shows the network bandwidth of IPFS in SC-CDM.  If a large amount of CDM data is shared Care Delivery Organizations (CDOs) via the IPFS network, the actual CDM data are encrypted and stored separately in IPFS, and the IPFS hash is stored and obtained in the blockchain [35]. Figure 6 shows how IPFS is starting as a daemon.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 20 If a large amount of CDM data is shared Care Delivery Organizations (CDOs) via the IPFS network, the actual CDM data are encrypted and stored separately in IPFS, and the IPFS hash is stored and obtained in the blockchain [35]. Figure 6 shows how IPFS is starting as a daemon.

X.509 Based Key Management
In this experiment, the ID of the administrator of the hospital was generated by using the OpenSSL library in python, and the X.509 certificate was generated and maintained in its own wallet. The public key associated with the generated private key was stored in the distributed ledger [36]. The following is part of the public key. The public key is used to verify the CDM delivered and is stored in IPFS.
-----BEGIN PUBLIC KEY-----MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAsZhhPfQsP3MTrCnn7wsJ wLX/VWxg1/Qf4Usv/uIfJGta2hcg2hg+TzSqv3bHjFssKBCSp7QnxC1dITt1QbAK oifDT0dEOoOyYZ7o1xTZBfJUZBa6w7vAnMmOnKPdf/QXgOZjHXr+c4mPjOB5OBVA wIaiO3ASSACkdeGZ2lZxrZe3tXYeX3Ng8a/nJ8OSdxWuoMKTQ3roooi8I4iC2SXU 5sgb9TxAF/nnm8q4vEiMPBi8d7FqsVPfh4tA2PoESqO/4g4c295Awfm2FT7ut8Z5 n1YyS3PZP7AcGnInWcZ6K5NAsUzsA60+WAgXIuUb4lzuCX9ZhhHM+/utu50in0Zd xQIDAQAB -----END PUBLIC KEY----- Figure 7 presents an example of how to encrypt and decrypt data in a file named a.out by using the encrypt program implemented the block encryption and decryption algorithm, as well as how to sign and verify it by asymmetric key algorithm. It can describe the state of data and the process of ensuring and preserving the validity and accuracy of data between two parties, the CDM provider and CDM consumer. One (CDM provider) encrypts data by using the public key in the file yrpubkey.pem, which is distributed from SC-CDM; the decryption key is derived from DID. To audit the integrity of CDM delivered from the outside to the SC-CDM, the other (CDM consumer) verifies the signature (in signfile.txt) of the CDM by yrprikey.pem, or its parts to be used later over encrypted data, and uploads the expected responses to the trust manager of SC-CDM.

X.509 Based Key Management
In this experiment, the ID of the administrator of the hospital was generated by using the OpenSSL library in python, and the X.509 certificate was generated and maintained in its own wallet. The public key associated with the generated private key was stored in the distributed ledger [36]. The following is part of the public key. The public key is used to verify the CDM delivered and is stored in IPFS.  Figure 7 presents an example of how to encrypt and decrypt data in a file named a.out by using the encrypt program implemented the block encryption and decryption algorithm, as well as how to sign and verify it by asymmetric key algorithm. It can describe the state of data and the process of ensuring and preserving the validity and accuracy of data between two parties, the CDM provider and CDM consumer. One (CDM provider) encrypts data by using the public key in the file yrpubkey.pem, which is distributed from SC-CDM; the decryption key is derived from DID. To audit the integrity of CDM delivered from the outside to the SC-CDM, the other (CDM consumer) verifies the signature (in signfile.txt) of the CDM by yrprikey.pem, or its parts to be used later over encrypted data, and uploads the expected responses to the trust manager of SC-CDM. Our integrity auditing protocol is an asymmetric key based integrity auditing scheme. When applying our integrity auditing protocol, an asymmetric key derived from DID of a client can audit the integrity of CDM without generating additional metadata.

Secure Proof of Ownership over Secure Transaction
Proof of ownership is similar to the integrity auditing protocol [37], but with the role of prover and verifier changed. In our construction, only clients with intact data can pass the proof-ofownership protocol. Moreover, since the proof of ownership protocol is performed over encrypted data [38], the protocol does not expose any information, since CDM delivered is persisted in the IPFS as blockchain.
The SC-Cloud handles CDM data in IPFS in order to support only secure temporary storage of the transferred CDM data, thereby making file information of the data confidential. Currently, IPFS does not provide separate access control for external access, so it needs to be resolved.
In order to handle the problem of keeping data of a large volume of CDM in the distributed ledger of the SC-CDM, IPFS is applied to perform effective and secure management of CDM data. The following is a transaction for delivering the CDM. The transaction for CDM is signed with SHA-256. Table 3 shows the contents of the transaction for CDM.

Privacy Leakage Resilience of SC-CDM
In constructing SC-CDM, since we are the scheme based on asymmetric key encryption, we prevent leakage of personal information in the integrity audit process. In the secure operation scheme, the public key of CDM consumer is maintained in SC-Blockchain. Figure 8 shows the result of the transaction log having a note CDM for registering public keys of CDM provider and CDM consumer in SC-Blockchain. CDM Separate Block Encryption/Decryption Our integrity auditing protocol is an asymmetric key based integrity auditing scheme. When applying our integrity auditing protocol, an asymmetric key derived from DID of a client can audit the integrity of CDM without generating additional metadata.

Secure Proof of Ownership over Secure Transaction
Proof of ownership is similar to the integrity auditing protocol [37], but with the role of prover and verifier changed. In our construction, only clients with intact data can pass the proof-of-ownership protocol. Moreover, since the proof of ownership protocol is performed over encrypted data [38], the protocol does not expose any information, since CDM delivered is persisted in the IPFS as blockchain.
The SC-Cloud handles CDM data in IPFS in order to support only secure temporary storage of the transferred CDM data, thereby making file information of the data confidential. Currently, IPFS does not provide separate access control for external access, so it needs to be resolved.
In order to handle the problem of keeping data of a large volume of CDM in the distributed ledger of the SC-CDM, IPFS is applied to perform effective and secure management of CDM data. The following is a transaction for delivering the CDM. The transaction for CDM is signed with SHA-256. Table 3 shows the contents of the transaction for CDM.

Privacy Leakage Resilience of SC-CDM
In constructing SC-CDM, since we are the scheme based on asymmetric key encryption, we prevent leakage of personal information in the integrity audit process. In the secure operation scheme, the public key of CDM consumer is maintained in SC-Blockchain. Figure 8 shows the result of the transaction log having a note CDM for registering public keys of CDM provider and CDM consumer in SC-Blockchain. CDM Separate Block Encryption/Decryption Algorithm (in Section 2.2.4) is used for building the encrypted CDM. The encrypted CDM is stored in IPFS with its metadata for signing and verifying. The transactions of new blocks can be verified by its hash value locally and the transactions from the IPFS network also can be verified. Transaction should be stored into IPFS. In the public-key-based solution, subsequent clients need to know the public key generated by the CDM uploader to audit the integrity. That is why subsequent clients can immediately learn who has a record. Figure 9 shows the transaction log related with consensus.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 20 Algorithm (in Section 2.2.4) is used for building the encrypted CDM. The encrypted CDM is stored in IPFS with its metadata for signing and verifying. The transactions of new blocks can be verified by its hash value locally and the transactions from the IPFS network also can be verified. Transaction should be stored into IPFS. In the public-key-based solution, subsequent clients need to know the public key generated by the CDM uploader to audit the integrity. That is why subsequent clients can immediately learn who has a record. Figure 9 shows the transaction log related with consensus.   Figure 10 shows the contents of the blockchain that consists of three blocks (0, 1, and 2) that are generated chronologically. The genesis block having is the first block in any blockchain-based protocol [39]. As can be seen in Figure 10, the block number 0 represents it. The block number 1 consists of a set of four CDM transactions, t1 to t3, where t1 is happened before t2 and t3 occurs lastly in the causal relationship. The output of the hash function, H, is the digest, a kind of hash value. For the CDM, the transaction t1 is issued for registering the public key owned by the CDM provider. The transaction t2 is issued for registering the public key owned by the CDM consumer. The transaction t4 in the block b2 represents the delivery of a signed CDM by the CDM provider. It is accessed by the Algorithm (in Section 2.2.4) is used for building the encrypted CDM. The encrypted CDM is stored in IPFS with its metadata for signing and verifying. The transactions of new blocks can be verified by its hash value locally and the transactions from the IPFS network also can be verified. Transaction should be stored into IPFS. In the public-key-based solution, subsequent clients need to know the public key generated by the CDM uploader to audit the integrity. That is why subsequent clients can immediately learn who has a record. Figure 9 shows the transaction log related with consensus.   Figure 10 shows the contents of the blockchain that consists of three blocks (0, 1, and 2) that are generated chronologically. The genesis block having is the first block in any blockchain-based protocol [39]. As can be seen in Figure 10, the block number 0 represents it. The block number 1 consists of a set of four CDM transactions, t1 to t3, where t1 is happened before t2 and t3 occurs lastly in the causal relationship. The output of the hash function, H, is the digest, a kind of hash value. For the CDM, the transaction t1 is issued for registering the public key owned by the CDM provider. The transaction t2 is issued for registering the public key owned by the CDM consumer. The transaction t4 in the block b2 represents the delivery of a signed CDM by the CDM provider. It is accessed by the  Figure 10 shows the contents of the blockchain that consists of three blocks (0, 1, and 2) that are generated chronologically. The genesis block having is the first block in any blockchain-based protocol [39]. As can be seen in Figure 10, the block number 0 represents it. The block number 1 consists of a set of four CDM transactions, t1 to t3, where t1 is happened before t2 and t3 occurs lastly in the causal relationship. The output of the hash function, H, is the digest, a kind of hash value. For the CDM, the transaction t1 is issued for registering the public key owned by the CDM provider. The transaction t2 is issued for registering the public key owned by the CDM consumer. The transaction t4 in the block b2 represents the delivery of a signed CDM by the CDM provider. It is accessed by the CDM consumer for handling the encrypted CDM. Eventually a signed CDM is verified by a private key of the CDM consumer in SC-CDM that is used as a part of the transaction t5. One block can be added to the blockchain at a time [40]. Each block is verified to ensure that it follows in sequence from the previous block. All CDM transaction records are kept in the blockchain and are shared with peers, including some members of a CDM service system who have joined in a blockchain network. The block of transactions is committed by the leader.

Analysis of SC-CDM
In the cloud CDM model presented in this study, in order to increase security, CDM data are divided, and a multi-channel method is used. In the cloud CDM model, the multi-channel method is used by dividing each CDM datum into two pieces of data and storing each piece in IPFS, to receive two IPFS hashes.
When a CDM data owner transmits CDM data, each original CDM datum is divided into two pieces of data, and the divided data are encrypted by using a symmetric key and a random vector. Then, the encrypted segmented data are added to the IPFS network and received in the form of an IPFS hash. Finally, the symmetric key, random vectors, and IPFS hashes are all grouped into a single structure and sent to the requester of the CDM data, along with the signature, through the blockchain. After verifying the owner of the CDM data with a signature, the requestor retrieves the file divided by the IPFS hash of the received structure. After that, the encrypted file is decrypted by using the symmetric key and random vectors to obtain the divided CDM data. The requestor can normally receive the actual CDM data by recombining the CDM data divided by the same rule as the owner to obtain the original CDM data. Even if the requestor receives the IPFS file through the IPFS hash, the data undergo a process of decryption once more, so the information is not exposed except for users authorized by the CDM data owner. Figure 11 shows the overall process for building a secure CDM in SC-Blockchain. As shown Figure 11, H1 and H2 represent hash values of CDM1 and CDM2. CDM1 and CDM2 are encrypted by the CDM Separate Block Encryption/Decryption Algorithm described in Section 2.2.4.

Analysis of SC-CDM
In the cloud CDM model presented in this study, in order to increase security, CDM data are divided, and a multi-channel method is used. In the cloud CDM model, the multi-channel method is used by dividing each CDM datum into two pieces of data and storing each piece in IPFS, to receive two IPFS hashes.
When a CDM data owner transmits CDM data, each original CDM datum is divided into two pieces of data, and the divided data are encrypted by using a symmetric key and a random vector. Then, the encrypted segmented data are added to the IPFS network and received in the form of an IPFS hash. Finally, the symmetric key, random vectors, and IPFS hashes are all grouped into a single structure and sent to the requester of the CDM data, along with the signature, through the blockchain. After verifying the owner of the CDM data with a signature, the requestor retrieves the file divided by the IPFS hash of the received structure. After that, the encrypted file is decrypted by using the symmetric key and random vectors to obtain the divided CDM data. The requestor can normally receive the actual CDM data by recombining the CDM data divided by the same rule as the owner to obtain the original CDM data. Even if the requestor receives the IPFS file through the IPFS hash, the data undergo a process of decryption once more, so the information is not exposed except for users authorized by the CDM data owner. Figure 11 shows the overall process for building a secure CDM in SC-Blockchain. As shown Figure 11 Figure 11. The overall process for building a secure CDM in SC-Blockchain.
In the IPFS network, an unchanging file is stored or downloaded, using an encryption hash function that returns a fixed value for the received data, and the user who will share CDM data is identified and signed through the encryption and decryption process, using the CDM partitioning algorithm. You can prove it through. Therefore, user access control is possible, and CDM data can be safely shared based on trust, because blockchain is used.

Performance Evaluation
By using IPFS to store files, the CDM cloud can provide a repository for CDM files with a method of persisting their CDM. When a CDM file is added to an IPFS node, the node returns a hash value of Content-Identifier (CID) for the CDM. The CID is the direct result of running the file through a cryptographic hash function. Every unique file will have a unique CID. This allows someone to retrieve a particular piece of content by providing the CID of the desired files. When a node is connected to the IPFS network, the content of that node is frequently broadcasted over the IPFS Distributed Hash Table (DHT). IPFS nodes treat stored data like a cache, meaning there is no guarantee that the data will continue to be stored, unless the CID is pinned.
In this experiment, CDM files imported into the SC-CDM are stored and shared in IPFS. Therefore, in order to evaluate the IPFS storage performance according to the size of the CDM file, the storage processing performance is examined according to the file size. When the IPFS node retrieves data from the network, it keeps a local cache of the data for future usage and takes up space on that particular IPFS node. The used files for writing the CDM are designated to have the size of 1 KB, 10 KB, 100 KB, 1 MB, and 10 MB. Figure 12 presents the elapsed time for writing to IPFS that is linear with the file size. In the IPFS network, an unchanging file is stored or downloaded, using an encryption hash function that returns a fixed value for the received data, and the user who will share CDM data is identified and signed through the encryption and decryption process, using the CDM partitioning algorithm. You can prove it through. Therefore, user access control is possible, and CDM data can be safely shared based on trust, because blockchain is used.

Performance Evaluation
By using IPFS to store files, the CDM cloud can provide a repository for CDM files with a method of persisting their CDM. When a CDM file is added to an IPFS node, the node returns a hash value of Content-Identifier (CID) for the CDM. The CID is the direct result of running the file through a cryptographic hash function. Every unique file will have a unique CID. This allows someone to retrieve a particular piece of content by providing the CID of the desired files. When a node is connected to the IPFS network, the content of that node is frequently broadcasted over the IPFS Distributed Hash Table (DHT). IPFS nodes treat stored data like a cache, meaning there is no guarantee that the data will continue to be stored, unless the CID is pinned.
In this experiment, CDM files imported into the SC-CDM are stored and shared in IPFS. Therefore, in order to evaluate the IPFS storage performance according to the size of the CDM file, the storage processing performance is examined according to the file size. When the IPFS node retrieves data from the network, it keeps a local cache of the data for future usage and takes up space on that particular IPFS node. The used files for writing the CDM are designated to have the size of 1 KB, 10 KB, 100 KB, 1 MB, and 10 MB. Figure 12 presents the elapsed time for writing to IPFS that is linear with the file size.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 16 of 20 The fact that the elapsed time is greatly increased in a 10 MB file is explained by the characteristic that appears because IPFS stores blocks in units of 256 KB. In the case of 1 MB, four blocks are written, but in the case of 10 MB, the size of the block increases by 10 times to 40 blocks. In other words, since 1, 10, and 100 KB are stored in one block, it can be understood that there is little difference in write execution time. The SC-CDM's data analysis is processed by using the analysis tools inside the SC-CDM. Registered researchers are required to access CDM materials for this purpose, and researchers using CDM perform IPFS node access that operates inside the SC-CDM. This differs from the way the current CDM delivers and operates data to remote researchers. In this case, researchers perform this experiment in terms of the performance of SC-CDM, which analyzes data through access to data within the SC-CDM. This experiment composes a swarm cluster with private network function after registering two IPFS nodes as peers, to show the benefits of CDM data access of SC-CDM. To evaluate the performance of the configured swarm cluster, the elapsed time of operation execution is measured by accessing the CDM data of the local IPFS node inside the SC-CDM. In addition, to evaluate the data-access elapsed time of the existing CDM operation method, the execution time of the CDM data-access operation in the remote IPFS node was compared. The elapsed data-access time of the remote IPFS node was to ignore network delays for objective performance evaluation. To this end, after passing the hash value to read data to another peer of the swarm cluster, the elapsed time of CDM data for data access was obtained.
The file is used for file accessing and is designated as the size of 1 MB, 10 MB, 100 MB, 200 MB, 500 MB, 750 MB, and 1 GB. After writing 20 consecutive times, leverage the elapsed time average and standard deviation of the execution time are determined, obtained as an experimental result. Table 4 presents the evaluation of accessing CDM data from IPFS in a swam cluster. Figure 13 shows the standard deviation of elapsed time and elapsed time (blue line) when CDM clients access the SC-CDM local IPFS node. In the traditional CDM environment, CDM clients showed standard deviation of elapsed time and elapsed time (yellow line) when accessing remote IPFS nodes. As shown in Figure 13, the elapsed time of data access of local IPFS and remote IPFS increases relatively as the size of CDM data increases. Table 4 shows the performance comparison of SC-CDM (local access) and conventional method (remote access), and it can be observed that the execution time of remote access increases as the CDM file increases. The CDM file showed 1.08 times the elapsed time for 1 MB and 3.65 to 8.77 times for 100 MB from 10 MBs. From 200 MB to 1 GB, access to remote The fact that the elapsed time is greatly increased in a 10 MB file is explained by the characteristic that appears because IPFS stores blocks in units of 256 KB. In the case of 1 MB, four blocks are written, but in the case of 10 MB, the size of the block increases by 10 times to 40 blocks. In other words, since 1, 10, and 100 KB are stored in one block, it can be understood that there is little difference in write execution time.
The SC-CDM's data analysis is processed by using the analysis tools inside the SC-CDM. Registered researchers are required to access CDM materials for this purpose, and researchers using CDM perform IPFS node access that operates inside the SC-CDM. This differs from the way the current CDM delivers and operates data to remote researchers. In this case, researchers perform this experiment in terms of the performance of SC-CDM, which analyzes data through access to data within the SC-CDM. This experiment composes a swarm cluster with private network function after registering two IPFS nodes as peers, to show the benefits of CDM data access of SC-CDM. To evaluate the performance of the configured swarm cluster, the elapsed time of operation execution is measured by accessing the CDM data of the local IPFS node inside the SC-CDM. In addition, to evaluate the data-access elapsed time of the existing CDM operation method, the execution time of the CDM data-access operation in the remote IPFS node was compared. The elapsed data-access time of the remote IPFS node was to ignore network delays for objective performance evaluation. To this end, after passing the hash value to read data to another peer of the swarm cluster, the elapsed time of CDM data for data access was obtained.
The file is used for file accessing and is designated as the size of 1 MB, 10 MB, 100 MB, 200 MB, 500 MB, 750 MB, and 1 GB. After writing 20 consecutive times, leverage the elapsed time average and standard deviation of the execution time are determined, obtained as an experimental result. Table 4 presents the evaluation of accessing CDM data from IPFS in a swam cluster.  Figure 13 shows the standard deviation of elapsed time and elapsed time (blue line) when CDM clients access the SC-CDM local IPFS node. In the traditional CDM environment, CDM clients showed standard deviation of elapsed time and elapsed time (yellow line) when accessing remote IPFS nodes. As shown in Figure 13, the elapsed time of data access of local IPFS and remote IPFS increases relatively as the size of CDM data increases. Table 4 shows the performance comparison of SC-CDM (local access) and conventional method (remote access), and it can be observed that the execution time of remote access increases as the CDM file increases. The CDM file showed 1.08 times the elapsed time for 1 MB and 3.65 to 8.77 times for 100 MB from 10 MBs. From 200 MB to 1 GB, access to remote files increased from 9.32 to 9.93 times. This experiment confirmed that the method is excellent for maintaining the CDM data inside the SC-CDM.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 17 of 20 files increased from 9.32 to 9.93 times. This experiment confirmed that the method is excellent for maintaining the CDM data inside the SC-CDM.

Discussion
In this paper, the basic model of CDM in Figure 1 for common operation of data of EMR data from various hospitals was extended to the cloud-based CDM model to design the architecture. Moreover, SC-CDM was proposed to provide safe operation of CDM data. To ensure data integrity, CDM data are encrypted and stored in IPFS and accessed from IPFS, to perform decryption. In terms of performance, the write performance of blocks in CDM data increased linearly with the creation time of the block ID of P2P in IPFS. In order to confirm the superiority of SC-CDM access to CDM data, a private network swarm cluster was constructed. Compared to the existing SC-CDM, the SC-CDM approach has been confirmed to be superior because the access time is about 10 times less when accessing 1 GB CDM.
Increasing demands for co-use of data in hospitals are driving CDM deployments, and SC-CDMs proposed in building networks involving multiple hospitals can be used as a major reference model.

Conclusions
CDM for data sharing and utilization of medical institutions requires access to patient medical information. It is used for disease research and customized medical care. In order to obtain meaningful results in research-network-based clinical studies on various patient data in hospitals, information assets need to be shared and be utilized. The cloud CDM provides interoperability for the participation of multiple hospitals and serves as an information-based research for customized and user-centered healthcare. However, reliable management of safe and transparent medical information of personal information is required. The cloud CDM model proposed in this paper, SC-CDM, applies distributed ID and blockchain technology for secure access control that occurs during CDM conversion and medical records access for joint use of hospital data. In this paper, experiments were conducted with storing and maintaining CDM data in IPFS. This confirmed that the data were fragmented and maintained, ensuring their safety. The result of the transaction is stored in the blockchain. Through this, it was found that the data-storage performance of the CDM composed of pieces was improved.