1. Introduction
The University of Houston (UH) Libraries, in partnership and consultation with University of Victoria (UVic), Texas Digital Library, Indiana University at Bloomington (IUB) and Indiana University-Purdue University Indianapolis (IUPUI), University of Miami (UM), and primary community stakeholders, including Stanford University, DuraSpace, and the Digital Public Library of America (DPLA), was awarded
$249,103 in funding from the Institute of Museum and Library Services (IMLS) National Leadership/Project Grant (LG-70-17-0217-17) to support the creation of the Bridge2Hyku (B2H) Toolkit [
1] to help to establish a capacity for libraries and cultural heritage institutions in the adoption of a turn-key open source digital system—Hyku [
2], formerly known as Hydra-in-a-Box. While Hyku offers a robust, extensible platform for making digital objects accessible over the web, institutions cannot take advantage of the benefits the platform offers until they confront the difficulties of migrating legacy data from other systems. As the results of the Hydra-in-a-Box User Survey [
3] demonstrate, many institutions want to migrate from their current digital repository system, but they lack the staff expertise and resources to plan and execute a successful data migration. These problems are compounded by a lack of general guides, tips, and hands-on tools to assist libraries in completing the digital asset migration lifecycle.
To help to address these problems, the UH Libraries, in partnership with the institutions mentioned above, has leveraged the IMLS grant funds to build the B2H Toolkit. In the process of building the B2H Toolkit, the project team conducted research through a digital collection survey, and gathering user stories and use cases to collect feedback from practitioners on digital migration needs. The project team intentionally incorporated this feedback into the iterative development of migration strategies and tools. Through fulfilling users’ needs, The B2H Toolkit guides institutions and migration practitioners to a better understanding of their digital library systems and how they can prepare for and conduct a digital content migration. This article outlines and articulates the research the project team conducted and the process the team went through in developing the toolkit that fills current gaps in the migration process by offering libraries and cultural heritage institutions migration strategies and tools for migrating their digital collections to Hyku.
2. Literature Review
Digital asset management systems (DAMS) have evolved over time as the technologies that support them have been refined and user needs and expectations have shifted. Not surprisingly, libraries have come to reassess, select, and migrate to new DAMS based on changes in technology and user needs. Stein and Thompson found that the rationales for switching systems were as varied as the institutions and consortia that underwent the migration process. According to them, organizations determine criteria for evaluating new systems using their dissatisfaction around key functions and services, while others were driven by future needs, particularly a system’s scalability and extensibility. Some examples of lackluster features and functionality in the literature identified by Stein and Thompson include: the quality of vendor technical support, underperforming search results and poor item discoverability, and the ongoing costs to license and maintain proprietary systems [
4]. An additional investigation by Stein and Thompson highlighted explicit metadata needs when migrating from one repository to another. They noted that repositories should accommodate multiple or all metadata schema, metadata reuse, and digital object identifiers. To better address these needs in the future, they concluded that metadata needs should be discussed at the onset of a DAMS migration, and metadata specialists should be involved in all stages [
5].
More and more libraries have gone through their repository reassessment process and decided to adopt open source digital solutions for desired functionality, including increased robustness, scalability, content type support, DAMS customization, and community support. Research findings by Stein and Thompson indicate that reassessment and migration processes often suggest that institutions are moving from proprietary systems, such as CONTENTdm, DigiTool, and Rosetta, toward open source solutions, including DSpace, Fedora, and Islandora. They noted several case studies that exemplify this trend: The College of Charleston went from CONTENTdm to a combination of Fedora, Drupal, OpenWMS, and Blacklight; the State Library of North Carolina moved from OCLC’s Digital Archive preservation repository to DuraCloud; Texas Tech University migrated assets from CONTENTdm to DSpace; and the Florida Council of State University Libraries collaboratively shifted from various proprietary and open source solutions (CONTENTdm, DigiTool, SobekCM) to Islandora [
4]. University of Oregon Libraries and the Oregon State University Library adopted Hydra (now Samvera) software as the next iteration of Oregon Digital, pursuing open source technology that is flexible, extensible, and incorporates Linked Open Data into all of their digital content in a triple store—Opaquenamespace [
6]. After a comprehensive evaluation and testing of DAMS on the market, the University of Houston Libraries identified the Samvera open source repository framework to replace their CONTENTdm repository for digitized cultural heritage materials [
7].
CONTENTdm is one of the most prominent proprietary DAMS on the market. This software is deployed widely across the globe by over 2000 organizations—covering the full spectrum of library types and sizes—including academic, public, consortial, and special libraries. Stein and Thompson, Myntti and Woolcott [
8], and the Hydra-in-a-Box surveys reveal that CONTENTdm was either the first or second most frequently used DAMS. While many institutions continue to use CONTENTdm, there are also documented case studies where other organizations have decided to migrate away from the product. Technical and access issues and increasing cost are the primary reasons for institutions’ departures from CONTENTdm. In “A Clean Sweep: The Tools and Process of a Successful Metadata Migration,” Neatrour et al. pointed out the changing needs and scalability issues that drove the decision for the University of Utah’s J. Willard Marriott Library to migrate away from CONTENTdm to Solphal, a homegrown system based on Hydra that leverages their Submission Information Metadata Packaging Tool and Apache Solr. Their migration led to a large-scale metadata normalization and standardization project which enhanced the consistency of metadata and improved discovery of digital content [
9]. J. Willard Marriott Library also conducted a massive newspaper migration from CONTENTdm to Solphal after a DAMS review that showed scalability, performance, reliability, and user interface issues [
10]. The College of Charleston “broke up” with CONTENTdm because of negative experiences with CONTENTdm support, the system’s disappointing and inconsistent search interface and results, and the increasing cost for the unlimited license [
11].
Repository migration is a complex process. Researchers have reported numerous challenges that digital collections migrators may encounter and the techniques used to overcome them. Digital collection migrators have been working diligently to overcome barriers in their repository migration to open source solutions. These barriers include the lack of tools and shared best practices that can guide migration planning and assist with metadata analysis, schema mapping, and content migration. Wilcox and Tripp note that open source communities often coalesce around tooling to support repository migrations, in their case Fedora: the application stack supporting popular open source repository solutions such as Islandora and Samvera. The success of these tools depends on the ability of each community to harness development effort in order to address the widest possible set of use cases. Lack of developer interest and local repository customizations present the most significant barriers to community-driven migration tool implementation [
12].
The methodology, tooling, and desired outcomes of a repository migration depend largely on the environment in which the migration occurs. Hardesty and Homenda present a case study outlining the potential complexities of open source repository migration projects when many applications, services, workflows, and personnel are impacted by a new system with a fundamentally different data model. Such migrations are not as simple as moving digital objects from one platform to another because the context of an object, and the corresponding structure that it should take in the repository, may be different depending on the service that provides access to the object. They emphasize the importance of thorough collection inventory analysis and careful planning among a diverse group of stakeholders when contemplating the migration of a repository ecosystem [
13].
In contrast, Woodward provides an example of a relatively straightforward migration from Bepress to DSpace using standard repository data exports, custom Ruby tooling for metadata mapping and ingest packaging, and a standard repository ingest mechanism. Although not without its share of issues to address, the migration described by Woodward exemplifies the best possible scenario for digital content migration: moving metadata and bitstreams from the source repository to the destination repository with minimal transformation and no system interdependencies to account for [
14].
3. Developing the Bridge2Hyku Toolkit
To enable institutions’ quick adoption of open source digital solutions, the University of Houston Libraries has been leveraging the IMLS project grant fund to build the B2H Toolkit. Originally a two-year project, the grant was divided into three phases. In phase one, the team conducted a digital collections survey of partner institutions, generated use cases, and identified metadata and system requirements for cross-walking data from CONTENTdm to Hyku. This phase lasted approximately eight months, from October 2017 to May 2018. Phase two was dedicated to developing and documenting the B2H Toolkit and building out the B2H website. It occurred over a 13-month period of time, from May 2010 to June 2019. Phase three, designed to last one year but extended at no cost, was used for assessing and enhancing the B2H migration tools as well as promoting the B2H Toolkit. It is currently underway, having started in June 2019, and will conclude in September 2020. The project team extended the grant to develop a sustainability outlook plan, to integrate Hybridge software into Hyrax, and to continue to promote the toolkit. The B2H project team used an Agile framework to achieve project goals and outcomes during each phase. The team solicited and incorporated migration practitioners’ feedback in all phases of the project.
3.1. Environmental Scan
To accomplish the task designed for phase one of the project, the B2H project team issued a digital collections survey to partner institutions that solicited data on digital collections, including: repository size, work and file types, digital systems infrastructure, metadata schemata, controlled vocabularies, workflows, institutional characteristics, and stakeholders’ considerations. Responses from the survey were used to assess the digital collection migration needs and considerations of B2H project partner institutions. This information was also useful in later phases of the project, such as identifying pilot testers, generating informational content for the website.
The survey results provided the B2H team with vital pieces of information that drove migration software development and migration best practices content. Below are the key implications for migration tools and strategies from the survey analysis:
B2H Migration Tools:
Migration tools must accommodate images, text and audio/visual materials in numerous access file types;
Migration tools should accommodate single, hierarchical, and multipart digital object structures;
Migration tools should be flexible enough to accommodate multiple metadata schemata and standards;
Migration tools should reconcile numerous controlled vocabularies, including emerging vocabulary such as GeoNames;
Migration tools should account for various system dependencies for digital collections;
Migration tools should allow for file renaming in transit.
B2H Migration Strategy:
The B2H Toolkit must include documentation on migration planning, including: content and metadata analysis, mapping of metadata elements and values, metadata standardization, and digital preservation considerations;
The B2H Toolkit should include documentation on migration workflows, including: metadata cleanup and remediation methods, materials reprocessing recommendations (e.g., rescanning and rerunning OCR), and step-by-step migration instructions;
The B2H Toolkit could include a bibliography of best practices for digital collections migration.
3.2. User Stories and Use Cases Development
In order to guide development of the B2H Toolkit, a Use Case Working Group was convened with the Project Manager, Content Strategist, Migration Strategists, Software Developers, and the project’s Metadata Advisor as participants.
The Use Case Working Group was tasked with formulating use cases that could then be prioritized into a minimum viable product (MVP) for the B2H Toolkit. In this instance, the users are the people responsible for migrating digital collections from CONTENTdm to Hyku. The group consulted sources such as the Use Cases guide at
usability.gov [
15] and employed the language of Agile development to create a set of statements in the form of “As a [role], I want [task].” Since this working group was focused on the migration process, the only role considered was that of migration specialist/migrator. Thirty six statements were developed, e.g., “As a migrator, I want to map my existing metadata fields to Hyku’s metadata schema.” Driven by the findings of the Digital Collections Survey and drawing from conversations with migration experts, the substance of the statements addressed key areas of focus when embarking on a migration: mapping metadata schemata and values from the source system to the target system; different digital object models (multiple file vs. single file vs. hierarchical); accounting for various formats (textual, audio, video, etc.); the process of moving to uniform resource identifiers (URIs) or controlled vocabularies in the target system; managing transcriptions at the object and file level; tools for cleaning metadata; process reporting and error handling; and documentation.
Once these statements were finalized, they were sorted using the MoSCoW prioritization method [
16]. This technique assigns one of four categories to requirements: 1. “Must Have”; 2. “Should Have”; 3. “Could Have”; 4. “Won’t Have/Nice to Have”.
These user stories and use cases, as shown in the
Appendix A (see
Table A1), served as the initial backlog of work. As the work continued, stories and use cases were modified, reprioritized, and new stories were created to more comprehensively reflect user needs. Many of the identified user needs fit into two broad categories that require different approaches and pathways. Some use cases and user stories led naturally to developing or adopting specific software and tools (made available in the Toolkit portion of the B2H site and Github [
17]). Other user stories highlighted the need to provide documentation about Hyku and migration processes in general (made available in the Migration Planning and About Hyku section of the B2H site).
3.3. Migration Strategies
Guided by the use cases, the team developed B2H website content to help users to contextualize their migration, evaluate their content, and learn about tools and best practices. This was done through periodic blog posts and the Migration webpage. While some site content is specific to a migration from CONTENTdm to Hyku, most of the information is cross-platform and can be applied to different migration scenarios.
The Migration page of the website walks through different components of migration planning and strategy. See
Table 1 for an outline of the Migration page.
The Migration Overview and Migration Planning sections guide users through a series of questions to help them to characterize their collections and define the broader context in which the migration will take place. Through this analysis, managers, team leaders, developers, and metadata specialists gain a better understanding of factors and parameters that will inform migration strategy and implementation.
Targeting migration implementers, the Normalize Metadata, Migrate Content, and Content Verification sections provide guidance on choosing the best strategies and completing the practical steps of system migration. Again, these sections are not prescriptive instructions but instead provide general advice and points of consideration for migrators to apply to their unique situation. Recognizing the need for more defined information, the B2H team wrote blog posts such as “Librarians Love Open Refine” and “Improving your Metadata Application Profile for System Migration” that include links to tutorials and more specific recommendations for migration work. Links to other migration resources are included in the Migration Resources section and blog posts.
3.4. Migration Tools
In addition to the user stories addressing migration strategies, a set of user stories were targeted for tools that would assist in the system migration process. The team initially evaluated current tools that could meet user needs and identified the tool developed by the University of Victoria: CDM Migrator [
18]. This tool, integrated into Hyku, connects to CONTENTdm (CDM) to export individual collections to a CSV file for metadata normalization. The UH team evaluated and tested the CDM Migrator to enhance and improve its functionalities.
As a complement to the CDM Migrator, the B2H team developed a tool called CDM Bridge [
19] that metadata practitioners and system administrators could easily install, use to evaluate metadata, and eventually to ingest digital content into Hyku. This tool fulfills requirements from user stories such as crosswalking and exporting metadata from CDM, reporting errors on missing field data, and normalizing field values prior to migration. Additionally, the tool is extensible and flexible enough to enable the user to output metadata fields for Hyku or other repository software.
The B2H team also developed the HyBridge [
20], which takes the CDM Bridge output files (metadata and image files) and ingests them into Hyku. HyBridge is installed within the Hyku application and is fully integrated with Hyku’s Dashboard, allowing users to select and import their ingest packages.
The tool development process incorporated an agile approach based on the Scrum framework. Team members used a timebox approach taking two weeks of work, or sprints, at a time to prioritize, refine, and fulfill user stories as well as conduct a sprint review. The team repeated this agile process to reach the tool’s MVP milestone.
4. Pilot Testing and Refinement
After the alpha release of the B2H Toolkit, the team partnered with the Texas Digital Library (TDL) to launch the B2H Toolkit Pilot Test Program to assess the functionality and usability of the tools developed. This structured testing included three main activities: (1) customizing Hyku for consortial needs, (2) confirming the B2H Toolkit’s compatibility with several migration paths, and (3) identifying and prioritizing additional features for tool enhancement in phase three of the B2H project. In addition to pilot testing, the CDM Bridge alpha version was introduced to the Samvera community.
Customizing Hyku for consortial needs focused on adapting Hyku to use a local metadata application profile (MAP), such as UH’s Bayou City Digital Asset Management System (BCDAMS)-MAP [
21]. Through this work, both the University of Houston and University of Victoria were able to integrate their MAP into the TDL-hosted Hyku instance.
The pilot project assessed the different components of the B2H Toolkit, including the toolkit website migration strategies, CDM Bridge software, and the Hybridge software. Through testing, the team aimed to confirm that the tools were able to migrate metadata and files from CONTENTdm to Hyku in the following four paths:
Local CONTENTdm instance to local Hyku instance;
Local CONTENTdm instance to hosted Hyku instance;
Hosted CONTENTdm instance to hosted Hyku instance;
Hosted CONTENTdm instance to local Hyku instance.
Both the University of Houston and University of Victoria were able to successfully migrate metadata from CONTENTdm to Hybridge. The three-month pilot testing provided valuable information for improvements and added features: It (1) adjusted Hybridge to convert GeoNames into URIs for Hyku, (2) provided warning messages within Hyku for invalid Rightsstatement.org URI’s and invalid work types, (3) improved Hybridge’s compatibility with both Hyku and Hyrax repositories, and (4) included ingest status information (staging, processing, complete).
In addition to the pilot testing, the team collected feedback from the Samvera community on the alpha release of the CDM Bridge and incorporated the following improvements: metadata mapping at the file level and software performance for large collections.
5. Toolkit Promotion and Feedback
Toolkit promotion was a major component of the B2H grant work. Team members offered workshops and presented on the B2H Toolkit and many library and library technology conferences. The “Strategies and Tools for Digital Repository Selection and Migration” workshop was offered twice and guided participants in generating user stories to inform system selection and migration goals. The workshop content walked through the migration planning and strategy questions to plan participants’ migration work. This workshop had a broad audience but was designed for digital repository managers, metadata librarians, and digital services librarians. Participants’ questions and feedback influenced further website content development to meet users’ needs. For example, participants asked for more specific details about convening focus groups and focus group questions. In response, the B2H team wrote the “Questions for Focus Groups” blog post [
22] in order to share these details more widely.
The B2H Toolkit was designed primarily with digital cultural heritage collection use cases in mind, but workshop participants were considering or planning migrations of their institutional repository content and archival finding aids as well. Reviewing the migration guidelines and best practices content with these other types of migration in mind ensured that the migration strategies were as widely applicable as possible.
The team also conducted over a dozen conference presentations at national and international conferences. They received positive anecdotal feedback on the need to expand the scope of the developed migration tools to facilitate migration from other proprietary digital repositories to Hyku. This would make the Toolkit applicable to a wider audience.
6. Toolkit Sustainability
As the work of the B2H Toolkit grant project concludes, project team members recognize that an ongoing commitment will be required to meet repository migration practitioners’ current and future needs. As such, the team has devised several strategies to address toolkit sustainability. First, they will work with the Hyku community to integrate the Hybridge tool into the Hyku source code. This work will ensure that the Hybridge is integrated into future Hyku releases, making it more accessible to institutions that lack the software development resources needed to complete the integration on their own. Second, the project team will complete a sustainability outlook report. This document will highlight features for the community to consider for future development. It will also forecast potential barriers, risks, and limitations—such as the introduction of new metadata standards into Hyku and the issues that would arise in the event that OCLC makes changes to the CONTENTdm API—that the community should be tracking as development progresses. Articulating these opportunities and challenges will allow project team members and the larger community to better anticipate needs as the B2H Toolkit and Hyku evolve over time.
7. Conclusions
The B2H Toolkit offers the comprehensive migration strategies and essential tooling needed to facilitate a robust migration from CONTENTdm to Hyku. The migration strategies provide repository practitioners with an end-to-end overview of the process. Documentation encourages practitioners to consider an array of topics such as conforming to industry standards and best practices, analyzing the quality of collection and item-level metadata, and understanding end-user needs. The B2H software automates the export of materials from CONTENTdm and the ingest of data into Hyku. Coupling the strategies and tooling, the B2H Toolkit mitigates migration barriers and facilitates a more efficient process for practitioners.
Beyond toolkit components, the achievements of the B2H project showcase the benefits of collaborative, community-driven initiatives. This could not be timelier, as studies show the rising interest in adopting open source repository solutions. Working across institutions, the B2H project team was able to build, test, revise, and release resources that fill gaps in the migration process—gaps that, in part, prevent institutions from adopting open source repositories like Hyku. The B2H Toolkit project presents a successful case study for developing resources between local workflows and practices (such as metadata creation and remediation or digitization) and open source community efforts (such as Hyku). The B2H Toolkit project can also serve as a blueprint for expanding migration bridges from other repositories, such as Bepress’s Digital Commons, to Hyku, making open source solutions accessible to even more institutions. Through the generous support of IMLS, the contributions of a dedicated project team, and the generous feedback of community stakeholders, the B2H Toolkit exists to educate practitioners on the DAMS migration process, provides strategies to successfully complete the migration, and accelerates the adoption of open source solutions for digital object stewardship moving forward.