3.1. The Scientific View on FAIR Digital Objects
To meet the scientific needs and expectations described in the previous chapter, we envisage a unit of data that is able to interact with automated data processing systems. We call this a FAIR Digital Object (FDO). From the perspective of a data scientist, an FDO is a stable actionable unit that bundles sufficient information to allow the reliable interpretation and processing of the data contained in it. In this section, we will give an introductory description of the FDO concept without going into technical details.
The encapsulation of information in an FDO is illustrated in Figure 3
. FDOs are accessed through their PID. They may receive requests for operations, which they may inherit from their type, as known from object-oriented programming. Through operations, their metadata can be accessed, which in turn describes the enclosed data content (a bit sequence). FDOs enable abstraction, i.e., at the object management level it does not matter whether the FDO content is data, metadata, software, assertions, knowlets, etc. In an FDO, a data bit sequence is bound to all necessary metadata components (descriptive, scientific, system, access rights, transactions, etc.) at all times. While the PIDs and metadata of FDOs are normally open, access to their bit sequences may be subject to authentication and authorization, for example, to secure personal or otherwise protected data.
From a researcher’s point of view, we can imagine the following ideal data flow scenario: A sensor or other source produces a bit sequence of data, associates metadata with it and places both in the care of trustworthy repositories. These repositories analyze the metadata and may decide to host and manage the new data as an FDO. Doing so implies that the data and metadata will be bundled in an FDO to which a type (with associated operations) and a PID will be assigned. Furthermore, the repository may extend the metadata based on known contexts; for instance, policies for permission to use the data may be turned into licenses and access control lists. Then repositories may propagate the FDO to other agents on the Internet, which examine the metadata of the new FDO and decide whether the new data are of interest. If so, they will seek to get access to the content, contingent upon licensing, security measures and legal constraints encapsulated in the FDO. Compared to older protocols such as OAI-PMH,19
new types of protocols will be used by repositories to offer their holdings, which are not restricted to metadata, but to any type of Digital Object.
All actors in cyberspace dealing with FDOs are connected using a unifying protocol which guarantees interoperability. For this purpose, we propose the DO Interface Protocol (DOIP), the role of which can be compared to that of HTTP on the Web. For technical details on DOIP, we refer to the version 2 specification which has been published.20
The concept of FAIR Digital Objects which will enable this innovative scenario goes back to papers in the 1990s by Kahn and Wilensky [14
]. Earlier, when Kahn designed the basic principles of the Internet, where scientifically meaningless datagrams were being routed, it was already understood that the objects to be exchanged between senders and consumers must be assigned some meaning. Shortly after, the design of the World Wide Web by Berners-Lee21
represented the first step in this direction. However, despite its benefits, the Web remains an ephemeral technology, of which the high numbers of link rot are a symptom. Therefore Cerf, a colleague of Kahn in the development of TCP/IP, stated that we risk sinking into a “dark digital age”.22
The introduction of stable FDOs based on persistent identifiers will not change the current bulk of the web, which is unstructured information, but it will offer a more stable and lasting solution for datasets in the registered domain, which need to be preserved for a long time. This stability will increase the level of trust by researchers and other stakeholders who are investing big efforts towards the preservation of scientific knowledge. This will, in turn, provide the following advantages for research and development to address the challenges that were discussed in Section 2
Scaled Cross-Disciplinary Capabilities: FDOs are a way to create an interconnected multi-actor and multi-level ecosystem, since there is a protocol (DOIP) that speaks the “FDO language” to all actors in this interoperable global domain of digital objects. This will allow us to invest in a new set of tools supporting cross-disciplinary research more efficiently compared to the current data practices.
Data Made Accessible: The gap between the amount of data being created and our capability to make use has different causes. Among these, the lack of specialized skills and the low level of recognition for cross-disciplinary work cannot be addressed directly by FDOs. However, one reason, the lack of contextual information, will partly be addressed by the FDO concept, which enables binding of contextual information to data in a stable and persistent way.
Interpreting Scientific Evidence in a Trusted Context: In FDOs, contextual and fingerprint information can be associated with digital objects at different steps during their lifetime. Privacy information can and must be associated with each digital object in a tamper-free way. In fact, all metadata are always bound to the data, so that researchers always have access to provenance and other information necessary for assessing fitness for purpose—thus building trust.
Domain of Reasoning: The evolving complex domain of knowledge in and across all scientific domains drives us towards automatic processing in our quest for data-driven conclusions. As actionable units, FDOs capture and build complex relationships over long time periods. Thus, they form building blocks that build knowledge structures for our evolving digital scientific memory.
Advancing Data to Actionable Knowledge Units: As FDOs travel through cyberspace and time, their encapsulation ensures that even after decades and in spite of changing technology and changing actors, data and their context will remain available as complete units. They will not lose any information but may accumulate contexts of reuse over time.
Tool Proliferation and Fundamental Decisions: The FDO concept achieves abstraction that hides technological details from the researcher, thus preventing technological lock-in and allowing technological innovation without putting the evolving Digital Knowledge Domain at risk. Virtualised registry and repository services can be connected into a federated core using unified DO protocols and offering understandable client interaction at the service layer.
3.2. Digital objects in the DFT Core Model
Although FDOs represent a quantity leap in data management, the concept can be implemented on top of the existing Internet protocols, as already suggested in earlier proposals [14
]. After the World Wide Web established HTML resources as referenceable and shareable digital entities, this technology started to open possibilities for the Semantic Web23
and the Linked Data Platform.24
However, the linked data concept is not sufficient [16
], as also concluded at a recent workshop [17
]. Subsequently, the term Digital Object (DO) found its revival in two developments: (a) the Data Foundation and Terminology Group of RDA,25
after having extracted a core data model from many scientific use cases, and (b) the design of large cloud systems, also called “object stores”. In the end, the definitions of DO made by the RDA DFT Core Model [18
] did not differ so much from the definitions used by Kahn and Wilensky. For more details on the term and concept of DO see [17
Two diagrams explain the pervasive nature of DOs in the DFT Core Model. The diagram in Figure 4
indicates the simple structure of this model. The content of a DO is encoded as a structured bit-sequence and stored in repositories. It is assigned a globally unique, persistent and resolvable identifier (PID), as well as rich metadata (descriptive, scientific, system, provenance, rights, etc.). Metadata descriptions themselves are DOs. Moreover, DOs can be aggregated to collections which are also DOs with a content consisting of the references to its components. This simple definition makes DO a generic concept, abstracting away from the many possible types of content of a DO, and covering the whole domain of digital data entities.
The diagram in Figure 5
indicates the role of the PID as the anchor point for accessing and reusing the DO. Assuming that the PID is indeed persistent, which is based on a cultural agreement, it makes sense to bind essential information into the PID record which will be returned to the user when a PID is being resolved. This essential information may contain paths to access the bit sequence, the metadata (also a DO), the rights record containing permission specifications, a pointer to a blockchain entry storing the transactions, a checksum for verification, etc. Furthermore, DOs are typed; operations are associated with a DO based on its type, which is a familiar, powerful concept from object-oriented programming. The RDA Kernel Information group26
defined a first core set of attributes which are of relevance for scientific disciplines and registered them in a public type registry. The nature of type registries has been specified by another RDA group.27
3.3. From Digital Objects towards FAIR Digital Objects
The above definition of a DO already satisfies some of the FAIR principles and has been used as a blueprint for their implementation [19
]. Intensive discussions between RDA and GOFAIR experts over the past year revealed that additional specifications were required to make DOs fully FAIR compliant. Early papers, including one by a European Commission Expert Group on FAIR Data, coined the term FAIR Digital Object [20
], but it was L. Bonino who recently identified the missing parts [22
]. It has thus become obvious that the specifications of the DFT Core Model were not sufficient to guarantee machine actionability with respect to all FAIR principles. The RDA Kernel group defined kernel attributes and registered them, but the DO model did not make any statements about their usage. The FAIR Digital Object (FDO) model needs to be specific on three aspects:
The FDO model requires the definition of PID attributes and their registration in a trustworthy type registry or a more complex type ontology, while trustworthy repositories are requested to use these attributes in order to achieve interoperability and machine actionability.
The FDO model requires metadata descriptions to be interpretable by machines. This implies that their semantic categories must be declared and registered. A moderate requirement could be to declare at least metadata categories strictly necessary for basic management, such as where to find the PIDs of relevant information components.
The FDO model requires the construction of collections to be machine actionable, thus enabling machines to parse collection descriptions and to find its component DOs.
It is still an ongoing task to specify the required semantic explicitness in necessary detail to support the FAIR principles and make FDOs fully machine actionable. A recent workshop28
resulted in the formation of a coordination group and a technical implementation group that will define formal processes around requirements for FDO, called FDO Framework (FDOF),29
and will elaborate the specifications. FDOF will allow for different technical implementations, nevertheless guaranteeing interoperability.
Researchers from many disciplines have been active in formulating organizational and technical specification details of FDOs since their scientific relevance became clear against the present background, revealing a pressing need to structure and represent increasingly complex scientific knowledge in a way that will be not only persistent but also independent of evolving underlying technology. International discussions have been held in two RDA groups, the Data Fabric Interest Group30
and the Digital Objects subgroup of the RDA Group of European Data Experts (GEDE-DO31
), as well as in the C2CAMP32
cooperation. We are currently seeing a tendency for these discussions to converge, while additional actors, e.g., from GOFAIR,33
are joining [19
Workflow frameworks that create and consume FDOs are expected to become increasingly popular, as further discussed in the next section. All components in a specific workflow (workflow script, software tools being used, data being processed, ontologies being applied, etc.) can be seen as a complex collection consisting of different object types. The goal of reproducibility suggests putting such collections into a container that can be transferred to another computational environment where it retains its full functionality. In this respect, the Research Objects (RO) initiative34
has been very active in specifying container and transmission standards that would ideally allow the execution of a workflow, including all its components on different virtual machines. Thus, the RO initiative is working towards complementary goals, so that a close collaboration can be envisaged.