Mapping of Data-Sharing Repositories for Paediatric Clinical Research—A Rapid Review

: The reuse of paediatric individual patient data (IPD) from clinical trials (CTs) is essential to overcome specific ethical, regulatory, methodological, and economic issues that hinder the progress of paediatric research. Sharing data through repositories enables the aggregation and dissemination of clinical information, fosters collaboration between researchers, and promotes transparency. This work aims to identify and describe existing data-sharing repositories (DSRs) developed to store, share, and reuse paediatric IPD from CTs. A rapid review of platforms providing access to electronic DSRs was conducted. A two-stage process was used to characterize DSRs: a first step of identification, followed by a second step of analysis using a set of eight purpose-built indicators. From an initial set of forty-five publicly available DSRs, twenty-one DSRs were identified as meeting the eligibility criteria. Only two DSRs were found to be totally focused on the paediatric population. Despite an increased awareness of the importance of data sharing, the results of this study show that paediatrics remains an area in which targeted efforts are still needed. Promoting initiatives to raise awareness of these DSRs and creating ad hoc measures and common standards for the sharing of paediatric CT data could help to bridge this gap in paediatric research.


Introduction
Conducting clinical trials (CTs) in the paediatric population can be challenging due to specific ethical, regulatory, methodological, and economic issues [1][2][3].Difficulties, such as the well-documented health equity issues in paediatric clinical research, can arise from the earliest stages of patient recruitment.For example, inequitable access to clinical care, institutional barriers and equity issues in the consent process, and barriers to participation related to study procedures hinder the work of research teams worldwide [4].Even the final stages of research (i.e., publication/sharing of results) are not without complications; the long time lag between the completion of a clinical trial and the publication of results in a peer-reviewed article can in many cases limit public awareness of research [5].Over the past 20 years, the need for greater transparency of ongoing and completed CTs has been in the spotlight [6][7][8][9][10][11]. From an institutional point of view, since the 2000s, there have been several regulatory initiatives aimed at promoting paediatric CTs.In the US, the Pediatric Research tical Industries and Associations (EFPIA).privatepartnership between the European Union (EU) and the European pharmaceutical industry, represented by the European Federation of Pharmaceutical Industries and Associations (EFPIA).
The aim of this study was to describe and analyse through eight indicators existing data-sharing repositories (DSRs) that share individual patient data (IPD) from clinical trials, with a particular focus on paediatric IPD.This evaluation aimed to address barriers to data sharing, such as concerns about handling sensitive data among data managers and sponsors, and the lack of standardized policies among clinical trial funders, which can hinder the adoption of data-sharing practices [28].
This paper represents a pioneering effort to map all repositories where paediatric clinical trial data can be accessed by paediatric stakeholders.Given the well-documented challenges of conducting clinical trials in this population, it is imperative to maximize the use of data already collected.Through this effort, we have now established a clear roadmap for access to clinical trial data that will significantly benefit the paediatric research community.

Methods
A rapid review of existing CT DSRs was conducted through the literature database PubMed and the search engine Google in 2020 and updated in September 2023.
The PubMed search strategy consisted of five main steps: (1) identification of research questions (Supplementary Materials); (2) identification of keywords that best answer the research questions; (3) creation of search strings and application of the "within the last 10 years" and "English" search filters when using strings (Table S1); (4) selection and screening of articles, reporting keywords in the title and/or abstract, performed by two authors; (5) extrapolation, screening, and analysis of DSRs according to the criteria described below.
The keywords identified in step (2) were also used in the Google search, and the first hundred results were screened.Details of the search strategy are provided in Figure S1.
The following definition was used: "Data-sharing repository (DSR)", an information system set up to manage, archive, and provide access (share) to datasets from CTs.To be included in the analysis, DSRs had to provide access to IPD, or at least provide clear instructions for submitting an access request.IPD data had to be publicly available in English.At least some of the data held in the DSR should be from paediatric-only trials and/or from trials on paediatric and adult patients from both industry-sponsored and academic-led trials.DSRs that did not meet the eligibility criteria or were no longer active were excluded.
DSRs that met the eligibility criteria were analysed and compared using a twostep process.
Firstly, publicly available information was searched on the DSR's website and tabulated.The following information was extracted: general features; type of data collected and related documentation; specific guidance on data composition/structure/format, relevance, and paediatric specificity (i.e., availability of paediatric CT data filtered by age group); legal provisions for uploading and reusing data; IT security measures and protocols.Some of these features, such as relevance and paediatric specificity, access to IPD, and data privacy measures, were classified as shown below:

•
Relevance was assessed as the ability of each DSR to provide access to the IPD of the subjects included in the CTs.
Controlled access: Access is permitted after the user submits a formal application requesting data access.The data requestor may need to provide a research protocol and analysis plan, including information on data management and plans for publication of results.The data can only be accessed and analysed within the DSR's workspace and not on the user's computer.A data-sharing agreement must be signed by the user.Open access: There is no formal process to access data.Researchers may explore but not download the data without a specific request.
• Different de-identification measures are adopted to protect the privacy of data subjects, e.g., pseudonymization or anonymization.An adequate de-identification (or encryption) measure is key to protecting study participants from reidentification.In the de-identification process, the participant's identifiable information is removed or replaced with a code, usually a random code number.

•
Pseudonymization processes personal data in such a way that it can no longer be attributed to a specific data subject without the use of additional information (e.g., a specifically created confidential key).Anonymization is a process that destroys any link to an identified or identifiable person via a pseudonym.
Details on the information collected are presented in Table 1.These eight indicators were identified from the questionnaire (34 items) proposed by the CORBEL and IMPACT Observatory projects [29].The indicators provide a general characterization of the DSRs and include aspects used to analyse them.The eight indicators were selected for relevancy and measurability.The validation process of the indicators provides internal consistency (Cronbach's alpha = 0.768, calculated from the pairwise correlations between items).The factorial analysis indicated a structure of these eight principal components that explain 58.7% of the total variance explained by the CORBEL and IMPACT questionnaire.Cronbach's alpha is a function of the number of test items and the average inter-correlation among them, and it was calculated in SPSS Statistics (IBM SPSS Statistics for Windows, Version 21.0.Armonk, NY, USA: IBM Corp.) using the Reliability Analysis feature [29].
Two authors independently rated each DSR by classifying each indicator with a score from 0 to 2. Details about the classification system adopted by the authors are reported in Table 2.
This classification does not assess the quality of the DSR but evaluates its performance against the set of eight indicators.All cases of uncertainty, discrepancy, or missing data were resolved through discussion, searches for additional data sources, and consensus.Disagreements were resolved by consensus building with two other authors.To determine the degree of concordance between authors, we used Cohen's kappa approach [30] and an assessment of the DSRs was conducted through a performance score cluster analysis [31,32].More specifically, the cluster analysis was performed to identify DSRs that fully meet the evaluation criteria based on the eight purpose-built indicators (total score descriptive analysis).By comparing the indicators' total score values of a model choice criterion across different clustering solutions, the procedure automatically determined the optimal number of clusters using Schwarz's Bayesian information criterion (SBIC) [33].The likelihood distance measure assumes that variables in the cluster model are independent.Further, each categorical variable is assumed to have a multinomial distribution.Empirical internal testing indicates that the procedure is fairly robust to violations of both the assumption of independence and the distributional assumptions.
Patient and public involvement: No patients involved.

Literature Search
The literature search identified a total of 773 articles via PubMed (n = 743) or other sources (Google and co-author suggestions, n = 26) selected by checking whether the title and abstract contained mentions of DSRs.Articles containing DSRs that did not give access to IPD, articles citing the same DSR, and articles citing no longer active DSRs were excluded.A total of 31 articles were identified as eligible for analysis (n = 22 from PubMed; n = 6 from Google; n = 3 suggested by co-authors) (Figure S1).

DSRs Selection
From these 31 articles, 45 publicly accessible DSRs that potentially included paediatric CT data were identified (Figure S1).Sixteen were identified in PubMed, and twenty-seven through other sources mentioned above.A preliminary screening phase was carried out to check the eligibility of each DSR.Nineteen DSRs were excluded as they did not host a real DSR or did not give access to IPD.Two DSRs overlapped with another DSR, and one no longer existed.At the end of the screening phase, twenty-one DSRs met the identified eligibility criteria and were included in the analysis (Figure 1).multinomial distribution.Empirical internal testing indicates that the procedure is fairly robust to violations of both the assumption of independence and the distributional assumptions.
Patient and public involvement: No patients involved.

Literature Search
The literature search identified a total of 773 articles via PubMed (n = 743) or other sources (Google and co-author suggestions, n = 26) selected by checking whether the title and abstract contained mentions of DSRs.Articles containing DSRs that did not give access to IPD, articles citing the same DSR, and articles citing no longer active DSRs were excluded.A total of 31 articles were identified as eligible for analysis (n = 22 from PubMed; n = 6 from Google; n = 3 suggested by co-authors) (Figure S1).

DSRs Selection
From these 31 articles, 45 publicly accessible DSRs that potentially included paediatric CT data were identified (Figure S1).Sixteen were identified in PubMed, and twenty-seven through other sources mentioned above.A preliminary screening phase was carried out to check the eligibility of each DSR.Nineteen DSRs were excluded as they did not host a real DSR or did not give access to IPD.Two DSRs overlapped with another DSR, and one no longer existed.At the end of the screening phase, twenty-one DSRs met the identified eligibility criteria and were included in the analysis (Figure 1).Most of the DSRs identified were found through PubMed searches (sixteen out of twenty-one).Two DSRs were identified through Google searches, and three were suggested by authors with expertise in the field (Table S2).
The BioCelerate DSR only provides access to detailed information about the DSR to members of associated companies, so an in-depth analysis of search options was not possible.Nevertheless, it was agreed not to exclude it from the analysis as it represents a possible data source.Most of the DSRs identified were found through PubMed searches (sixteen out of twenty-one).Two DSRs were identified through Google searches, and three were suggested by authors with expertise in the field (Table S2).
The BioCelerate DSR only provides access to detailed information about the DSR to members of associated companies, so an in-depth analysis of search options was not possible.Nevertheless, it was agreed not to exclude it from the analysis as it represents a possible data source.

Data-Sharing Repositories' Characteristics
The overall characteristics of the DSRs as well as the URLs of the associated webpages are reported in Table 3.The degree of concordance between authors in the evaluation of the eight indicators was good overall.A strong degree of concordance was obtained for the relevance and paediatric specificity indicator, with Cohen's kappa 0.856, 95% C.I. (0.670-1.042), and for the instructions for prospective data users, with Cohen's kappa 0.878, 95% C.I. (0.666-1.090).Likewise, the procedures for patient-level data access indicators (a), the IT security measures/protocols (b), the guidance on data composition/structure/format for data owners/submitters (c), and the sustainability indicator (d) were found to have a moderate degree of concordance:

Analysis of the Eight Indicators
Details are reported in Table 4.

1.
Relevance and Paediatric Specificity In twelve out of the twenty-one DSRs, it was possible to search for paediatric CT data due to a filter for a specific or a generic age group (e.g., 0-6; 6-12; 6-18), or through the availability of specific keywords for the paediatric population (e.g., paediatric, neonates), with the exception of Biocelerate due to restricted access to detailed information about the DSR.Only the PTN and PCDC DSRs are completely dedicated to the paediatric population.Notably, PTN does not host its own DSR but shares data through the Data and Specimen Hub DSR (DASH).Five DSRs contain paediatric CT data that can be filtered by specific paediatric age groups (e.g., less than 2, between 5 and 10 years, etc.).Seven DSRs provided limited filtering options (e.g., filters for generic age groups) at the time of our evaluation, and in nine DSRs we were not able to filter, download, or easily access exclusively paediatric data.

2.
Instructions for data owners/data submitters Thirteen DSRs provide clear, easily understandable instructions for data owners/ submitters on which data are in scope and how to submit data, including information on any specific formats or requested schemas.Two DSRs provide only basic, minimal, and non-exhaustive instructions about 'how to upload', or these do exist but were not publicly available at the time of our review.In six DSRs, we were not able to find instructions for data owners/submitters to advise what data are in scope and how to submit data.

3.
Instructions for prospective data users Seventeen DSRs provide clear, easily understandable instructions for prospective data users on how to access and/or analyse data.In four DSRs, we were not able to find clear instructions for prospective data users, or only basic, minimal, and non-exhaustive instructions are publicly available.

4.
Guidance on data composition/structure/format for data owners/submitters Seven DSRs provide clear, easily understandable guidance or recommendations for data owners/submitters on specific models, standards, or formats for data or metadata that can be hosted in the DSR.The most common types of file formats for the data download are SAS and CSV.Three DSRs provide basic, non-exhaustive guidance or recommendations.In eleven DSRs we were not able to find any guidance or recommendations freely available within the DSRs.

Data Protection
Three DSRs clearly reported a data protection policy, providing on their webpage information about measures to protect data privacy through de-identification and anonymization (or pseudonymization) processes.At the time of our research, eight DSRs reported only general information about the data protection measures adopted, but no data protection policy was specified or made publicly available.We were not able to easily find this information in ten DSRs.

Procedures for Patient-Level Data Access
Seventeen DSRs clearly present procedures and materials relating to IPD access agreements, and/or a data access agreement template is available for adoption.Two DSRs mentioned the procedures that should be adopted, but they were not extensively explained.In two DSRs, we were not able to identify clear, easily understandable measures/procedures to access data.
Access to IPD varies between DSRs: • Data sharing is adopted in six DSRs.
• The controlled access model is adopted by twelve DSRs.

•
Open access is adopted only by one DSR.

IT Security Measures/Protocols
Three DSRs had protocols available on their websites for regularly testing, assessing, and evaluating the effectiveness of technical and organizational measures to ensure the security of the processing in place.Nine DSRs reported only a summary protocol.For nine DSRs we were not able to find a security protocol or safety measures publicly available on the website.

Sustainability
Nineteen DSRs reported on their website that they receive regular funding or are regularly sustained and can demonstrate business continuity measures.Only two seem to have no regular/sustained funding but have business continuity measures in place.Sixteen DSRs are sustained by a public funding source, including all three European DSRs.Four DSRs are sustained by private funding, and one is based on public-private partnerships.

Cluster Analysis
A cluster analysis was performed to identify DSRs that fully meet the evaluation criteria based on the eight purpose-built indicators (total score descriptive analysis).The number of clusters to be formed was not specified in advance and was calculated using Schwarz's Bayesian information criterion (SBIC).The cluster outcome showed two groups in terms of elements evaluated: one cluster consisting of five DSRs that meet our evaluation criteria (cluster centroid mean score = 13.40)reporting a higher performance score and one cluster consisting of the remaining sixteen DSRs (cluster centroid mean score = 8.81) with a lower performance score (Table 5).

Discussion
The majority of the DSRs analysed in this research were identified through a PubMed search, Google search, or through private contacts.All the DSRs were fully active at the time the research was carried out.Only one DSR, Rapid-19 (https://www.rapid-19.org/DSR-data accessed on 3 October 2023) was no longer active, probably because it was built during the SARS-COVID-19 emergency.It was therefore excluded from the analysis.
Eighteen of the twenty-one identified DSRs are located in the US, and three are in Europe, highlighting the lack of eligible DRSs in the rest of the world.The origin of the IPD stored in the identified DSRs was beyond the scope of this review and was not investigated.The US led the evolution of transparency in CTs with the requirement for registration of clinical trials by ICMJE and FDAAA (2004) [34][35][36].This was followed by EMA Policy 0070 (2014) [37].Other relevant initiatives include the PhRMA/EFPIA principles for data sharing (2014) and the IOM Sharing Clinical Trial Data report (2015).
Since 2018, hundreds of ICMJE journals have started to require authors to complete a data-sharing statement describing who, what, when, where, and why IPD will be shared [38].
Different types of DSRs were identified: those specific to study data (e.g., age-specific, disease-specific, or stakeholder-specific) and generic DSRs that collect broader clinical research data.Generic DSRs represent the majority of the twenty-one analysed DSRs.
Most of the DSRs supported research on paediatric IPD.However, only the PTN and the PCDC DSRs were focussed on paediatrics.PTN is sponsored by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NIHCD) and does not have a dedicated DSR.The data available in the PTN DSR are shared through the Data and Specimen Hub (DASH) (https://dash.nichd.nih.gov/(accessed on 3 October 2023 ), of which PTN is only a subset.
PCDC, sponsored by the University of Chicago, hosts the world's largest harmonized clinical data set for paediatric cancer research and provides data through the unified DSR for researchers, PCDC Data Portal.
Most of the identified DSRs appeared to satisfy the proposed indicators and almost all the DSRs could provide useful data for secondary use.All the DSRs identified allow data to be stored and accessed for free.Access models range from publicly accessible webbased systems with the option to download datasets to different types of request/review mechanisms that may or may not allow data to be downloaded.
Most of the DSRs provide guidance or instructions for both data owners and data users and clear, understandable information about the procedures for patient-level data access.This information is available on the platform's website.The most common method for patient-level data access requires the submission of a research proposal which is reviewed by an independent review panel (IRP).A signed data-sharing agreement is usually required before accessing data.Most of the DSRs have clear, easily understandable measures to protect data privacy and provide an environment with anonymized data upon approval of the request.
DSRs are mostly sustained by public funds, and all the European DSRs are in this category.This is a significant advantage since developing and maintaining useful DSRs for efficient data sharing tends to be expensive.Data from different sources are often collected in different formats, using different protocols and endpoints and must be quality-controlled and standardized before analysis can be performed across studies.The upfront costs of developing community standards and networks of collaboration may be high [2].This could be impactful when the population is small, as with the paediatric population, where stakeholders are fragmented, and there are a limited number of interested parties.However, once these investments have been made, the time and effort required by potential users is relatively low, and the potential for data to be reused in ways that benefit public health is high, making the investment cost-effective [2].
There is also heterogeneity among the DSRs in terms of data upload, data handling, and DSR access.The lack of commonly acknowledged guidelines on the structure of DSRs for the sharing of CT data inevitably leads to inconsistency in the data available.This may affect the availability and quality of paediatric data for secondary use and represents a barrier to data availability that could be mitigated by adopting common international standards [39].This heterogeneity limits the interoperability within DRSs and, consequently, the ability to have a representative homogeneous IPD sample.
Information about IT security measures is only reported on a few websites, despite the importance of making this information publicly available.An adequate and transparent data protection policy and ad hoc IT measures may guarantee better quality of data, prevent data breaches, and increase the confidence of users of the DSR.

Strengths
To our knowledge, this is the first report addressing the availability of paediatric IPD within DSRs of CT data.Only four studies have previously addressed similar topics: the first was carried out by N. Anthony et al., the second by Ohmann et al., the third by Banzi et al., and the fourth was published by the Clinical Research Data Sharing Alliance (CRDSA).N. Anthony and co-workers have recently analysed the digital impact of published reuse of clinical data in terms of media attention and citation rates on three DSRs (CSDR, YODA, and Vivli).They did not find a substantial difference between reusing data from DSRs and using a sample of equivalent studies published in the same journals [40].The study by Banzi et al. is not focused on this specific population [26].On the other hand, Ohmann and colleagues provide an overview of the status and use of the sharing of IPD and make recommendations to address common barriers, such as structuring data and metadata using recognized standards, managing DSR data, and accessing and monitoring data sharing, but do not specifically address the paediatric field [41].Last but not least, the "Review of Biopharma Sponsor Data Sharing Policies and Protection Methodologies", recently published by the CRDSA, provides an overview of key policy elements that impact the value of research benefits to end users, such as what data are shared and how data are transformed to protect patient privacy across only three data-sharing platforms: Vivli, CSDR, and YODA [42].
This study also highlights the importance of the work being carried out by c4c (https://conect4children.org/ Accessed on 17 April 2024) that aims to facilitate the development of new drugs and other therapies for the entire paediatric population through the creation of systems, tools, and standards to enhance the quality, utility, reusability, and uniformity of the data collected during paediatric clinical trials.

Limitations
None of the twenty-one DSRs are designed specifically for the paediatric population and its specific characteristics.This inevitably impacts the ability to carry out paediatric research due to the huge differences in the paediatric population, ranging from neonates to adolescents.To carry out effective research on a specific cohort of paediatric patients, a dedicated DSR is needed that is tailored to their characteristics (e.g., that allows for the selection of preterm subjects under 1 kg).
The PTN alone does not support this level of specificity.This study has some potential weaknesses and limitations.Not all the available DSRs have been identified, mainly due to the dynamic nature of the topic and the time needed for a publication.Despite this, we attempted to use the most rigorous and extensive search strategy to identify as many DSRs as possible.The search strategy adopted was not intended to be systematic, but it can be considered the most appropriate to provide a descriptive overview of the available DSRs.Since we included only DSRs with information available in English, it is likely that some DSRs, mainly from non-English speaking countries, have been missed.Using only PubMed and Google for literature and DSR searches may lead to bias, as relevant studies or repositories not indexed in these databases may be missed.Investigating additional databases or sources could provide a more comprehensive overview.Limiting the Google search to the first hundred results may not capture all relevant information, as other repositories or studies may appear beyond the first hundred results.

Future Perspectives
The number of DSRs is expected to grow over time, mainly due to the policies and initiatives implemented in the last two decades [10], and standard instruments (e.g., checklists) for assessment of the suitability of DSRs could be beneficial.Further efforts are needed to raise awareness about these DSRs as a central, safe point for researchers to find data from CTs shared by public or private sponsors, enhancing their value and creating ad hoc methods or procedures to reuse data in a responsible and standardized way [26].This is especially important in the paediatric context in which nonreporting/nonpublication of findings remains common [8].
Moreover, the process of harmonization and standardization of data (e.g., CDISC standards https://cdisc.org/accessed on 3 October 2023)) is time-consuming and costly but is an essential step in making data more FAIR.It would be beneficial to address the economic hurdles associated with making DSRs more FAIR across different specialities and countries [24,43].
Achieving interoperability between different DSRs means ensuring that these repositories can exchange data and work together seamlessly by implementing mechanisms for mapping and translating data between different formats and schemas.
Collaborative initiatives and data-sharing networks can promote data sharing by establishing common practices, protocols, and governance frameworks.They support semantic interoperability to standardize the semantics of data elements and facilitate accurate interpretation and integration of data across repositories.Data interoperability has several advantages, such as greater statistical power, poolable for post hoc analysis, pragmatic clinical trials, and analysis of under-represented subgroups [43][44][45][46][47].
Several additional challenges must be addressed, particularly in emerging economies.These challenges include legal and policy issues, scarcity of coordination between research groups, lack of a culture for data sharing, ethical/privacy considerations, insufficiency of proper infrastructure (including high-speed Internet connectivity), deficiencies in the interoperability of DSRs, shortage of data managers and data scientists, and a scarcity of open data DSRs to facilitate data sharing [27].
Data sharing can also help to develop and validate artificial intelligence (AI) models in the medical field in areas such as electronic medical records, medical imaging technology, medical big data, intelligent drug design, and smart health management systems.AI solutions can potentially improve the standardization and accuracy of clinical decision making while providing more dimensions of data accumulation for medical knowledgebased systems.These developments can also support physicians and researchers in the optimization of treatment plans and decision making about optimal treatment options [7].

Conclusions
Although data sharing is widely recognized as a fundamental requirement of scientific research and strongly encouraged, only a few CT DSRs exist in the paediatric field.To the best of our knowledge, this work is the first report addressing the availability of paediatric IPD within DSRs of CT data.
This work provides an inventory of the main DSRs containing paediatric clinical trial data, describing their main characteristics to disseminate and encourage the knowledge and subsequent use of DSRs.The latter may facilitate a clear and transparent sharing of paediatric CT information in the scientific community and relieve researchers, data managers, and sponsors of the ethical, regulatory, and economic burdens, shortening the time to respond to paediatric therapeutic needs.
Eight criteria were identified and used to assess the comprehensive suitability of DSRs.The overall result shows heterogeneity between DSRs in terms of data upload, data handling and access to the DSR, instructions for data submitters and users, procedures for

Figure 1 .
Figure 1.Flow diagram for the identification of DSRs.

Figure 1 .
Figure 1.Flow diagram for the identification of DSRs.
•Funding: private or public or public/private sponsor • Funding: regular/sustained funding and business continuity measures

Table 2 .
Evaluation of the eight indicators.

Table 3 .
Main characteristics of the included DSRs.

Table 3 .
Cont.Genomic Data and DSR Information Coordinating Center (BioLINCC) was the first DSR established in 2000, while the most recent is the Rare Disease Cures Accelerator-Data Analytical Platform (RDCA-DAP) founded in the US in 2021.Details about the year of establishment are reported in Table3.Most of the DSRs cover more than one therapeutic area (n = 15), and

Table 4 .
Analysis of the eight indicators.

Table 5 .
Total score descriptive analysis.