1. Introduction
Management of data in complex processes implies managing all phases upon which applications rely, as well as ensuring compliance with regulations concerning data storage and security. Regulations may vary according to the kind of data retained: e.g., data subject to GDPR must be managed for the minimal needed time for processing and must not be used for purposes that have not been explicitly authorized by the data subjects. In addition, management should support the efficiency, in terms of performances, of all processes that run over data on the same information system. Finally, the scale of data is increasing, and the organization of data is getting diverse, including now relational and non-relational databases, data warehouses, data lakes, or other sources.
Decision making under such complex conditions is not trivial, and requires tools to be performed in an informed way. Data Lifecycle Models [
1] provide a reference, in some measure validated, in terms of best practices, by the fact that they are successfully used in different domains and that a corpus of experience is available and can be leveraged in new projects. Data lifecycles have been studied and documented in literature—and technical literature is also available—but generally exhibit numerous aspects to be evaluated, are composed of different sets of phases, and present different parameters and pros and cons that must be taken into consideration in the decision processes related to the definition of a new system, the evolutionary maintenance of existing systems, and the operations with respect to the evolutions of the workload. Decisions have to be taken based not only on quantitative parameters but primarily on qualitative parameters, which require expertise and experience and may differ from phase to phase or may be proper of a single phase or of the whole lifecycle.
The original contribution of this paper is a novel decision support approach to DLM choice and design, supported by a tool, based on the Analytic Hierarchy Process (AHP), which we developed for data managers, providing a proper methodological approach. The tool allows data managers to exploit the expertise of data lifecycle experts and domain experts to choose the most appropriate data lifecycle, comparing it with several reference data lifecycles from literature, supporting them in the analysis by providing qualitative and quantitative references that condense the knowledge and experience of the panels of experts, including the possibility for extending and customizing the support by incrementing and updating experts’ or specialists’ contributions. In fact, the tool fully supports the approach by allowing incremental evaluation in case of change in the experts’ panels, evolutionary evaluation in time, weighting of evaluations according to the level of expertise of each panelist, and the addition of a novel data lifecycle to the set of reference lifecycles. The tool has been developed in the framework of a complete data lifecycle evaluation methodology defined by our research group, but is meant for reuse both in different projects and for different purposes.
Our approach defines proper evaluation criteria and sub-criteria for each phase, which are weighted appropriately on each single case to synthesize a reference about quality, security, compliance, and general governance for the information system. The goal is to help non-specialists of the application domain in decomposing and systematizing a plurality of complex judgments and to update periodically the outcomes, so this tool will be useful for data managers to choose in their project the Data Lifecycle Model that is best fit for the particular data management problem.
That said, we designed the proposed tool to support decisions on the basis of abstract, high-level comparative criteria that differ for each phase of a data lifecycle, constituting a range of options related to data planning, assessment, and governance.
This paper is structured as follows: the next section provides the necessary background, while in
Section 3 are discussed the motivations that drive the approach and the related tool;
Section 4 summarizes the basic ideas behind AHP;
Section 5 describes the proposed approach;
Section 6 presents the support tool;
Section 7 shows the application of the method to a simple case by means of the tool, while in
Section 8, the impact of blockchain technologies in Data Lifecycle Management is been discussed; and conclusions close this paper.
2. Background
The GDPR (General Data Protection Regulation), which became enforceable in May 2018, requires the implementation of a Data Management Plan (DMP), in particular for projects involving personal data. The DMP covers all aspects of data management, from collection and storage to sharing and access, and it characterizes the data management strategy of organizations (large or small, public or private). In this context, an essential instrument that can be adopted by companies or public administrations is the Data Lifecycle Model (DLM). A DLM can be considered a set of phases related among themselves, thanks to which data flows and processes used to transform data into knowledge can be identified. In principle, there are as many DLMs as there are organizations that need to use them, since the models are easy to adapt to different situations as possible. In fact, the literature proves a high degree of flexibility of these models.
In [
2], a description is provided of a DLM with fourteen elementary phases that can be used to represent the basic properties of DLMs. Consider that the fourteen steps described in [
2] are not strictly in an order and, above all, do not need to be all in each implementation of a DLM. Instead, increasing the complexity of the model tends to correspond to higher costs for organizations. Let us take into consideration the DLM for a scientific dataset as an example. Depending on the nature of the application, not all phases are required in such a DLM: for example, if no personal data are involved, it is not needed to include a privacy-related phase. In fact, it is certainly not forbidden to include each step in a model, but it is appropriate that the information system is integrated and economically well structured in the choices for its structure. The sequence of DLM phases is not rigid but very flexible, and some phases described in [
2] (Governance, Safety and Security, Quality) can even be considered “transversal” so that they do not actually fit a proper sequence of phases. Considering in this case as an example two DLMs designed to collect scientific data, USGS [
3] and DataONE [
4], in the first case, there is a database that must be defined ex novo because, at the dawn of this information system, it is necessary to plan the data collection phase, as previous sources did not exist yet. Setting up a database into which to pour data acquisitions requires different prior and planning management. In the case of DataONE, however, the scientific reference data already existed and came from different data sources. The problem to be tackled is not easier because data could be heterogeneous, and this can be an issue, but the collection phase of DataONE has a series of rules to be applied to integrate the various datasets.
The final phase may also not be always present in a data management strategy, but it is in general necessary to go through the concept of
data waste [
5], and it might may be necessary to plan for data disposal or destruction. The data deletion process is implemented when the organization wishes to dispose of inactive or obsolete data. This approach has two advantages: it reduces both storage costs and the risks of non-compliance with prevailing regulations. Note that, in the case of sensitive data, physical backup media must also be securely destroyed. Examples are digital health DLM or video surveillance models [
6]. In these cases, the need to free up space in the storage subsystems may become a routine operation to be foreseen in the DMP by planning the final deletion of them, which for other DLM is not mandatory, but is a close and cyclic activity with such high frequencies. Given the adaptability and diversity of DLMs in many organizational scenarios, choosing, ranking, and tailoring the DLM stages to the unique requirements, limitations, and goals of each situation is a crucial task. In such types of contexts, determining the best DLM configuration becomes a difficult task that frequently calls for striking a balance between qualitative and quantitative factors.
Multi-Criteria Decision-Making (MCDM) approaches are being used more and more to assist in the assessment and choice of DLM components in order to address this complexity, and among them, AHP has proven to be one of the most successful of these because of its domain neutrality and capability of breaking down decision difficulties. By combining quantitative data and expert judgment, AHP makes it easier to compare various criteria and options. Recent studies have applied AHP in various decision-making and risk assessment contexts [
7], demonstrating its effectiveness in guiding structured evaluations. For instance, ref. [
8] presents an AHP-based Information Security Risk Assessment framework, ref. [
9] applies AHP to identify vulnerable IoT components in healthcare systems, and [
10] utilizes AHP to define appropriate countermeasures for protecting Personally Identifiable Information.
AHP results in being a well-established and widespread method in both industrial practice and academia research. In industry, its application covers, for instance, supplier selection, strategic planning, project prioritization, and quality management, with documented case studies in sectors such as healthcare, manufacturing, and finance [
11,
12,
13,
14,
15,
16,
17,
18]. In academia, systematic reviews report many peer-reviewed studies and applications across domains such as human resource management, operations, and R & D evaluation, further demonstrating robust empirical validation and methodological maturity [
19,
20,
21,
22].
Moreover, in [
23], AHP was adopted to evaluate strategic assessment of healthcare agencies; in [
18], the authors used AHP in mining engineering for mine-planning risk assessment, investment analysis, and qualitative decision making. Meanwhile, other works used AHP in urban and architectural strategic planning in order to evaluate different design solutions [
24]. Human resources emerges as another domain in which the technique has been involved for the recruitment of new employees [
25]. Lastly, in [
26], AHP was used as a model for selecting business processes for software management.
In our context, AHP is applied in our work to support the configuration and evaluation of DLMs tailored to privacy-sensitive contexts.
3. Rationale
Management of information systems is a consolidated practice that can be considered one of the first management activities that assumed a specific and autonomous characterization, together with software engineering. Notwithstanding its long evolution, this discipline still provides methodological contributions to professionals because of the evolution of architectures, applications, needs, and cultural perspectives. As for software engineering, management of information systems is strongly rooted in the needs of the professional practice of engineering and has to cope with a variety of requirements and a variety of possible choices in the design of the hopefully best solutions for users’ problems, taking into account all practical constraints, such as technical limitations, cost viability, maintenance needs, performance requirements, and compliance. Some of these constraints require an early evaluation and enforcement, and consequently influence design choices in the very first phases of a design cycle, whichever is the chosen approach: it is, for example, the case of privacy requirements rooted in the GDPR, which prescribes privacy-by-design and privacy-by-default, or of performance requirements for systems, which are in general analyzed and verified in their consequences even before the definition of systems architecture, using the desired behaviors of the system emerging from functional specifications as a source for quantitative or qualitative models, such as Petri nets [
27], queuing networks [
27], fault trees [
28], or even more complex conceptual design support tools such as multiformalism modeling approaches [
29].
Ideas and methods from software engineering, of course, provide consistent and general support to design, specially for what pertains to the organization and management of design processes.
The experience of performance evaluation, a peculiar part of software engineering, suggests intervening in the project in the very early stages: here, however, it is not possible to leverage the system’s behaviors, i.e., the functional specifications, since the object of the project is the creation of a data management cycle on which different applications will be based over time, not a software, and probably the specifications available, which will generate the technological specifications for the system, are non-functional specifications relating to the quality of the lifecycle, to privacy constraints, and to performance, all specifications that are generally expressed at a high level and that have a mainly qualitative nature or can be chosen only on a comparative basis.
The comparison occurs by comparing different solutions based on a plurality of quali–quantitative factors, but the area in which literature and professional practice provide the greatest support, the technological one, is in this case partially fixed identically for all alternatives, since the platform often already exists, and cannot be defined a priori in detail for the remaining part, because it is a consequence of decisions that depend on comparisons (and the same could be true for the available data, or part of them). This therefore makes it necessary to structure an approach to the choice and design of DLMs that can make use, from the early stages of the project cycle, of methods suitable for supporting informed choices on high-level aspects of the systems.
In this work, therefore, we propose to base the decision-making process on a literature solution, known as AHP; however, there are other methods that can be used, which are outside the scope of this work and are already currently under analysis for further development.
4. The Analytic Hierarchy Process (AHP) Method
AHP is a modeling method belonging to the families of the so-called
multi-criteria decision models that was originally proposed by Saaty [
30,
31,
32]. Given a set of alternatives and criteria, it used to obtain priorities (or weights) starting from the construction of a pairwise comparison matrix and calculating its principal eigenvector.
This modeling technique allows for approaching the problem as a structured and multi-level hierarchical decomposition, via (i) general objective definition; (ii) definition of the alternatives that need to be compared and evaluated; and (iii) criteria set (and eventually sub-criteria) definition, which competes with the final mark of an alternative. Each criterion is compared with all others via a relative importance measure defined by Saaty’s scale.
Saaty also proposed a numeric scale to evaluate each criterion, composed by numbers ranging from 1 to 9 and defined in the following
Table 1. The intermediate values (2, 4, 6, 8) can be used if a factor is slightly more important than the other to express intermediate judgments.
Pairwise comparisons allow for constructing the pairwise comparisons matrix , in which each indicates the relative importance of criterion with respect to criterion : if criterion is as significant as criterion ; if is more significant than .
The
A matrix is reciprocal, meaning that
. The priority vector
w, in which each element represents the relative weight of each criterion, is drawn from the pairwise comparisons matrix; this corresponds to solving the eigenvalue problem, as follows:
in which
is the largest eigenvalue of matrix
A.
Priority vector w is then obtained by normalizing the principal eigenvector linked to until the sum of its elements equals 1. The relative weights of the criterion are represented by the normalized primary eigenvector.
In order to assess the reliability of the given pairwise judgments, a consistency check is performed through the calculation of the
consistency index (CI), as follows:
allowing for calculating the
consistency ratio (CR) using the following formula:
in which
is the random index for a matrix of order
n. Accepted values of CR are those for which CR is less than
, meaning that the degree of inconsistency is within tolerable limit and the judgments are considered consistent [
33].
The AHP logic presented so far makes the decision process more structured and clear to the decision maker. Nevertheless, in a context with a large number of criteria,
comparisons are required, meaning that as
n increases, the time and the cognitive burden of the decision maker increase, resulting in considering AHP as less appealing. In order to tackle this problem, a simplified variant of AHP has been proposed, named
AHP-Express. AHP-Express [
34] represents a light version of the standard AHP because it reduces the number of pairwise comparisons required while still relying on the core idea to approach the problem hierarchically. Here, the difference resides in the choice of a criterion as a
reference element, which is considered the most dominant among all the others, so all the comparisons are made against such element, and this drastically decreases the number of pairwise comparisons to
. The priority of each element
j against the reference
i is given by
where
i is the index corresponding to the reference factor R;
j is the index of the non-reference factor;
denotes the user-assigned comparison value of R against j.
In case of consistency of the comparisons, this priority corresponds exactly with the eigenvector of the pairwise comparison matrix.
This highlights the following key benefits:
Reduction in the overall analysis time: compared with traditional AHP, which requires all pairwise comparisons, the express version speeds up the decision process while saving human resources and time resources, allowing for a more consistent evaluation by avoiding attention deficits due to so many comparisons.
Increased acceptability and stakeholder involvement: a more straightforward method can certainly help its acceptability by avoiding the opinion that AHP is too time-consuming and/or too cumbersome.
Oriented iterative review: in data context, often conditions (e.g., volumes, technologies, and regulations) change rapidly, so the initial decision about a particular data lifecycle might need some reviews; to do so, AHP-Express’s ease of use and velocity are crucial.
5. Methodological Approach
The logical workflow of the AHP-Express decision support process can be adapted to the context of DLM selection [
37]. The structure of the process is articulated as in
Figure 1, in the following steps:
General objective definition: the principal scope of the analysis is here explicitly expressed and placed at the top of the hierarchical structure;
Hierarchy construction: it is defined on several levels, depending on the case;
Definition of the values in the decision matrix;
Choice of the reference element and pairwise comparisons, to determine which element is perceived as more relevant;
Weight calculation, to obtain, with the logic of levels and AHP reference factors, first the local weights of the alternatives for a specific criteria, then priorities by aggregating the weights along the level escalation, in order to obtain the global weights of each alternative with respect to the general objective;
Analysis and decision, in terms of a relative ranking of all the alternatives.
AHP should be mapped to the decision issues that characterize the choice of DLM. On the basis of the meta-phases defined in [
2], all DLMs may be mapped onto a set of phases, which are the phases we mentioned in the examples reported in
Section 2 and that can be overall summarized in a set reported in
Table 2. We grouped the meta-phases, for the aims of the application of AHP-Express to our problem, in criteria that semantically represent the role of meta-phases in the logic of the decision process, as in
Table 2, which represents groupings on each single line of the table. Each criterion stands for one of the relevant roles that a phase can play in a DLM. Further, criteria are related in the table to the categories they belong.
Figure 1.
Operational phases of AHP-Express.
Figure 1.
Operational phases of AHP-Express.
To apply the steps to our case, the levels mentioned in step 2 will be organized as follows:
First level: general criteria definition, in terms of two categories, category A, in which the particularly critical criteria have been grouped in case the data to be managed have to be collected from scratch (namely, “New Data”), and category B (namely, “Old Data”), which groups all the remaining criteria, which are to be considered of the same importance regardless of whether the data to be managed are collected from scratch or are already existing data belonging to other projects;
Second level: criteria definition, which corresponds to the six meta-phases identified, as shown in
Table 2.
The values mentioned in step 3 are defined, which in the logic of the decision process represent the evaluations for each phase of each selected DLM defined by a different DLM expert as a score in Saaty’s scale. This requires the definition of a set of DLMs that are coherent with the purposes of the decision process and provide one of the alternatives between which the decision process should discern. Here, we define the set, considering a selection of the most significant papers in the literature about data management with relation to DLMs. In this step, consequently, a panel of DLMs experts does select the set of DLMs and assign a score for each criterion for each DLM.
The idea of this work is to take advantage of the combination of main phases and transversal phases of the main DLMs to design and implement a tool for the assisted selection of the best-suited DLM for the needs of a specific data management problem by exploiting a pre-evaluation of DLM phases operated by the mentioned panel. The grouping of the phases into six criteria operated in
Table 2 is meant to simplify the comparison of DLMs and to generalize the applicability of the tool.
There are a number of different DLMs developed by academic and professionals over the past years, and in [
2], 78 different DLMs are mentioned. To redact a list of significant DLMs for further analysis, the idea we use for choosing the models is to consider all models that have at least a high rating for each criterion. In this way, we have selected 10 different DLMs, as follows:
USGS [
3] and DDI [
38] because of the relevance in the definition of
Starting criterion;
HINDAWI [
39] for the
Assessment criterion;
DataONE [
4] and CIGREF [
40] for the
Computation criterion;
DCC [
41] for the
Administration criterion;
IBM [
42], PII [
43], and CRUD [
44] for
Security;
Enterprise Datalifecycle (or EDLM) [
45] for the relevance of criterion
End-of-Life.
The chosen DLMs have been selected because they are well consolidated and used both in academia and in industry, and in particular, USGS has been developed by a government agency (U.S. Geological Service); DataONE, IBM, and CIGREF have been implemented by industries; and all other models are the result of academic work. Our selection process has been based on a ranking of the various phases in which they are articulated, considering them in the framework presented in [
2], reconsidered in terms of
use,
reuse, and feedback,
share,
publish, and
governance phases. This process has been documented in another paper, currently submitted for publication, consequently it is out of the scope of this work.
To give anyway a glance of the process, in the case of DCC here, we rank it as 10 in administration in
Table 3 because this category includes and summarizes use, reuse and feedback, share, publish, and governance phases, which all contribute to a high ranking as DCC is a DLM for digital artifacts supported by the conformance to OAIS and ISO 15489; in the case of CRUD, we rank it here as 10 in security because it can accommodate different processes but mandatorily includes the Create, Store, and Destruct phases, which are of paramount importance to control the security aspects of data acquisition, maintenance, and disposal, ensuring that data are only stored until their permanence in the system is justified and their disposal is guaranteed and controlled after that moment. Only DLMs from literature that exhibit one or more high rank in the relevant categories have been selected, unless they exhibit characteristics that are not totally covered by others, as in the case of CIGREF [
46] or DDI. In this moment, readers can refer to authors for details about the overall analysis, while the publishing process is ongoing.
The selected panel, composed of the authors and a group of data management experts that agreed to keep themselves anonymous, because of their positions, evaluated the six criteria for the ten DLMs, as reported in
Table 3. This matrix has been used to accomplish step 3.
Steps 4, 5, and 6 are specific for each single application of the decision process a user of our method may want to perform to choose the best fitting DLM. In step 4, AHP should be applied by eliciting preferences about the criteria prioritization, in comparison with a chosen reference criterion: this step may be assisted by software. The reference choice is meant to be the one that is thought to be most important in category A and category B. Categories A and B group, in order to build the top-most level of the AHP hierarchy, criteria that are more relevant if the decision is about a DLM suitable for systems in which the prevalent attention is on creating from scratch a new data repository or if new data feeds are more relevant than existing ones (A) or on augmenting an existing repository or if existing data are more relevant than new data feeds (B).
Step 5 simply applies AHP-Express computation, while step 6 allows for evaluating the final rank of the proposed DLMs alternatives, which also may be assisted by software. As decision should be as much informed as possible within the limitations of the experience of the user, in this step, an extended support to the user, in terms of both graphical comparisons and numerical information, may greatly improve the value of the presented method.
Consequently, our method has been companioned by means of a support tool.
6. The Proposed Tool
The purpose of the tool is to support both professionals and academics in applying, in an easy and fast way, our method, but it is in general suitable to support any AHP-Express-based decision process.
6.1. General Architecture and Workflow
The tool is structured as following:
The decision maker may interact with the application via the Streamlit interface, which introduces the user to the usage of the tool and guides the user smoothly through the entire process, from data loading to results evaluation.
The first section allows for uploading a file, either as CSV or as Excel, that contains the information about the alternatives that go under analysis. The required structure of this file consists of DLMs as rows and sub-criteria as columns.
The user can assign a weight to the categories presented (which in the case of this paper are identified with “New Data” or “Old Data”), whose sum equals 1. Moreover, if required, the tool also allows one to conduct multiple interviews (of different experts) and assign a weight to each one of them as well. The tool then aggregates the results of the interviews according to a weighted geometric mean.
For each macro-category, the user can define a “reference” that is considered to be initially more relevant, and because of this, the needed pairwise comparisons are simplified (as already pointed out) to only checks.
The key function of the tool is
calculate_ahp_express_prior(), receives as input the ratio of comparisons
, and returns the normalized weights of the priority
j according the following formula:
Once the priorities of each sub-criterion with each macro-category (i.e., “New Data” and “Old Data”) were computed, we combined these weights into a global priority vector. The aggregation is obtained by the following formula:
where
is the global priority vector;
is the priority of the sub-criteria i in its own category (i.e., Cat.A or Cat.B of our case);
is the value of the DLM for the sub-criterion j.
In other words, at first, the aggregation is obtained by multiplying each priority by the weight assigned to the relative macro-category, determined by the user. This results in an overall priority for each sub-criterion, consistent with the hierarchical setting of AHP. Then, the final score of each DLM is obtained as a weighted sum of its values in each macro-category by using final priorities as weights.
6.2. Sensitivity Analysis
Another distinctive feature of the tool is the possibility to conduct a sensitivity analysis on macro-categories’ weight variation. The function sensitivity_anal() allows for exploring the variation of to observe how each DLM score varies. The procedure is as follows:
Iterates through all the possible values of with a step of ;
At each step, re-computes combined sub-criteria priorities;
Re-computes final DLM scores;
At the end of the process, a chart is drawn showing all variations in DLM score depending on variations.
In this way, more “robust” models (those for which the score does not vary drastically) and more “sensible” models are identified as well by a change in the importance of categories.
6.3. Output and Advantages for the Decision Maker
The tool integrates some visualization features that produce (i) a bar chart comparing final DLM scores, (ii) a radar chart displaying sub-criteria strengths and weaknesses, and (iii) sensitivity plots showing score variation. At the end of the analysis, the decision maker has a final ranking of the DLMs under evaluation, and information about how much each criteria weight influences the overall ranking. This allows for taking more robust and informed decisions.
Moreover, the strong point of the overall analysis is AHP-Express itself, which drastically reduces the time and human resources dedicated to the task, by only requiring, as seen, pairwise comparisons. This results in being particularly useful when the alternatives and the sub-criteria to evaluate are a lot, or when evaluations from a panel of experts are required.
In summary, the presented tool offers an interactive and user-friendly environment for the evaluation of different DLMs via an AHP-Express modeling approach. Because of its modular architecture, the tool offers the following:
The possibility to easily manage input data;
Automatic priority calculation via AHP-Express, drastically reducing time and cognitive load caused by pairwise comparisons;
The evaluation of the stability of the decisions via sensitivity analysis;
The production of useful charts that aim at helping even the less experienced decision makers in the field of data lifecycle modeling.
7. A Running Example: “Museo del Carbone”
The Museo del Carbone belongs to the field of cultural heritage conservation. This Italian museum, situated in Carbonia, is part of the European Route of Industrial Heritage and is dedicated to documenting the historical activities of local coal mining, featuring mineral specimens and historical relics. In this context, temperature, humidity, and light levels are critical for artifact preservation, as variations in these conditions may result in material degradation.
In this instance, data management is focused on monitoring and regulating parameters to mitigate threats to artifacts, while maintaining the usability of the site and ensuring safety. This case, developed during a doctoral research project, is based on a technical report on the design of sensing garments for museums, and serves as a suitable candidate case study for selecting the most appropriate DLM system to manage the data lifecycle generated by sensors, aiming for optimal sensor allocation and precise artifact monitoring.
The objective is to use the proposed tool to identify the most appropriate DLM aiming to support the data management of sensor-based artifact monitoring within the museum, balancing usability, security, administration, and computational load across all lifecycle phases. The first step is between choosing the pre-configured decision matrix or inputting a customized one in CSV format, as shown in
Figure 2.
The next step is to perform the interviews for the experts’ group (
Figure 3). Due to the fact that our primary objective at this time is to try the tool as a Decision Support System and to highlight not only the DLM with the best ranking but the eventual alternatives to be discussed, the group is composed of only three experts to avoid a strongly characterized output. The results of interviews are the priority vectors for both A and B categories, as shown in
Figure 4.
At this point, all needed information has been loaded, and the computation is done. The final ranking is shown in
Figure 5.
All obtained results are also shown in
Figure 6, a bar diagram output about the final DLM ranking in which scores were calculated based on the weighted criteria. The three top-performing DLMs are the following:
Hindawi (7.9);
DCC (7.4);
CIGREF (7.2).
Figure 6.
Bar plot with the overall final ranking.
Figure 6.
Bar plot with the overall final ranking.
Better insights can be gained from the radar plot in
Figure 7, showing the performances of the ten DLMs across the six lifecycle criteria. The power of this plot resided in its capability in providing strong points and weak points of each model in a comprehensive perspective, supporting a decision while presenting all elements in a synthetic presentation. Models such as DataONE and DCC exhibit strong performance in computation and administration, while Hindawi scores consistently high across almost all sub-factors, suggesting an overall balanced and robust framework. In contrast, IBM and EDLM reveal limitations in several dimensions, particularly in the End-of-Life and Starting phases, respectively, suggesting gaps in lifecycle completeness or implementation practicality.
A sensitivity study was carried out by progressively changing the weight of category A, a high-priority criterion (such as Security or Starting), in order to evaluate the ranking robustness.
Figure 8 displays the findings.
The previous figure shows that Hindawi, DCC, and DataONE are highly resilient to shifts in the relative importance of specific criteria, maintaining their rankings with little variation. On the other hand, when the weight of category A increases, IBM and EDLM continue to deteriorate, proving their unsuitability in situations where priorities are crucial.
8. Interactions with the Technological Variable: The Blockchain in Data Lifecycle Management
The presented approach provides a general method to conduct an assessment of solutions in the earliest phases of a design process. Anyway, DLM characteristics are not the only factor that can be exploited in order to guide preliminary decisions, once the main evaluations have been carried on. Technological factors may also be considered, defining a second approximation scenario, which may confirm or disprove a preliminary decision. Blockchain technology provides a significant example, which allows us to show how the proposed approach may be further used for the next step of the design process.
Blockchain technologies present unique features, such as immutability, decentralization, transparency, and cryptographic security, that may influence the phases of the data lifecycle, introducing several constraints and particular advantages/disadvantages in some part of DLMs. The impact of blockchain in DLM phases may be described as follows [
47,
48,
49]:
Planning/collection phase: data are created as transactions or blocks in a distributed ledger, and are cryptographically hashed for integrity, and smart contracts can automate data generation based on predefined rules;
Share/Governance phase: share phase may benefit from the distributed nature of blockchain;
Archival/Disposal phase: blockchain systems are typically append-only systems; the deletion of an item is not simple to implement [
50];
Data assessment: the distributed nature of blockchain makes this phase more challenging;
Analysis/Storage phase: data are replicated across nodes, ensuring redundancy and availability;
Computation: this phase could benefit from the distributed nature of blockchain;
Security phase: blockchain systems have a high built-in security level due to the high tolerance to attacks [
50].
It is therefore relevant to highlight that the good level of security of blockchain systems can ensure security compliance even if the DLM to be used has a weak (or none) security phase. At the same time, it is clear that the possible data deletion phase is hindered by the intrinsic immutability of blockchain. That said, having in mind the same scenario described in the previous section, in case of blockchain implementation, the expert group could hypothetically choose to modify the score for some criteria in the decision matrix, such as the following:
The modified decision matrix is shown in the following
Table 4.
Using those values in the proposed tool, and giving in input the same values determined by experts, the global score of DLMs has been changed: in this case, the three DLMs that have been evaluated as best alternatives are the following.
Hindawi (7.5);
DCC (6.8);
USGS (6.7).
The overall ranking of all DLMs in this case is shown in
Figure 9.
It can therefore be observed that, in case of blockchain implementation, the results change significantly. In the new list, the best rank is obtained by Hindawi, followed by DCC, as in the previous case. Meanwhile, the third rank has been obtained by USGS DLM, which does not have any security phase: this means that, in a blockchain-based implementation, even a simple but solid DLM may be worth the attention due to the intrinsic high security of a blockchain environment.
9. Conclusions and Final Remarks
In this paper, the application of the AHP-Express method to DLM selection has been experimented, and a tool for decision support in this field has been developed. In the evaluated example, the proposed tool has proven to be effective in assessing DLM suitability to the specific problem, while maintaining the visibility of a wide range of indexes useful to a well-reasoned selection of the best DLM.
It should also be noted that the proposed tool, in the current version, does not interact with existing data management software, which limits its applicability in dynamic contexts where DLM data may evolve over time.
Future work will span in two different directions: the the impact of blockchain on data management and DLMs will be explored in more depth, and a series of case studies will be proposed. Another direction sees the integration of this tool in a wider project that will include also risk–benefit analysis, defining a proper risk–benefit model for this purpose. With regard to the proposed tool, the radar plot representing the ranking of the alternative DLM will be restricted to a limited number of alternatives, allowing for a better readability of the results and easier understanding.