Next Article in Journal
OpenBiodiv: A Knowledge Graph for Literature-Extracted Linked Open Data in Biodiversity Science
Next Article in Special Issue
The Oxford Common File Layout: A Common Approach to Digital Preservation
Previous Article in Journal / Special Issue
Data2paper: Giving Researchers Credit for Their Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

What Is an Institutional Repository to Do? Implementing Open Access Harvesting Workflows

Office of Digital Research and Scholarship, Florida State University, Tallahassee, FL 32306, USA
Publications 2019, 7(2), 37; https://doi.org/10.3390/publications7020037
Submission received: 16 March 2019 / Revised: 20 April 2019 / Accepted: 23 May 2019 / Published: 27 May 2019
(This article belongs to the Special Issue Selected Papers from Open Repositories 2018)

Abstract

:
In 2016, Florida State University adopted an institutional Open Access policy, and the library staff were tasked with implementing an outreach plan to contact authors and collect publication post-prints. In 2018, I presented at Open Repositories in Bozeman to share our workflow, methods, and results with the repository community. This workflow utilizes both restricted and open source methods of obtaining and creating research metadata and reaching out to authors to make their work more easily accessible and citable. Currently, post-print deposits added using this workflow are still in the double digits for each year since 2016. Like many institutions before us, participation rates of article deposit in the institutional repository are low and it may be too early in the implementation of this workflow to expect a real change in faculty participation.

1. Introduction

This content recruitment workflow is a component of a larger Open Access (OA) implementation plan proposed to the Florida State University (FSU) Faculty Senate by University Libraries staff following the adoption of an institutional OA policy in 2016. The challenge and goal of this plan is scaling repository operations by gathering author content and encouraging faculty to submit their manuscripts without the robust technical services of a commercial solution, such as Symplectic Elements. Commercial solutions were initially explored, however, it was soon determined that the annual cost to the library was too high to support. FSU Libraries operates their institutional repository (IR) on the open source platform Islandora, migrating from bepress in 2015. This migration occurred well before the Elsevier acquisition of bepress, fueling many institutions to look at open source platforms as an IR to support their community’s open scholarship. FSU’s migration from bepress to Islandora was made possible by having more resources, a new developer was hired, there was a pre-existing familiarity with Drupal, consortial hosting through Florida Virtual Campus (FLVC), and an active development community [1].
In the last four years, the repository has expanded to include content other than theses and dissertations, such as capstone projects, technical/research reports, video ethnographies, journal articles, undergraduate research posters and honors theses, 3D models, and data sets. Following the adoption of the OA policy, a small team of librarians in the Office of Digital Research and Scholarship developed a plan to address the low faculty self-submission rates of journal publications to the repository. This plan pairs a metadata harvesting workflow and semi-automated metadata record creation with outreach emails to researchers about recent publications, providing them with information regarding the OA policy and the opportunity to upload their manuscript to the IR. Knowledge shared by the open community was invaluable in constructing this workflow and the goal of this work is to highlight those key resources, as well as to report on our progress recruiting content.

Literature Review

A historical overview of institutional repositories, their purpose and development are outside the scope of this review. There is a vast amount of literature that surveys IR deposit rates, many concluding that average self-archiving rates are low across institutions. This literature review intends to cover literature that examines factors influencing faculty IR participation and content recruitment, as well as the strategies developed by libraries to address low self-archiving rates.
According to Clifford Lynch’s 2003 definition of a “mature” institutional repository, FSU’s research repository, DigiNole, is well on its way to becoming one [2]. Starting with theses and dissertations, the IR expanded by curating faculty and student research publications, as well as teaching materials and gray literature. Institutional records and heritage materials are under the purview of the University Archivist and are not part of DigiNole. Recently, we have started uploading data sets to the IR and linking them to research publications to fulfil public access mandates. Despite this growth, DigiNole experiences what generally seems true for repositories: A low faculty participation rate.
There is quite a lot of ground covered by many publications over the years, examining IR growth rates and investigations into faculty deposit rates. In his 2008 American Library Association (ALA) presentation, Royster thought faculty members would see the value in the IR themselves after he met with them at department meetings, but the self–submission rate was less than 10% [3]. After moving to a more mediated deposit method, rates improved to 15–20%, with most of the material being contributed by the Physics department [3]. There is already an established culture of open access in Physics through the posting of pre-prints on arXiv, so this likely contributes to such a high compliance rate from the Physics department. Royster found that locating articles and checking their publishing permissions first before going to the author resulted in a response rate of 90% [3]. What exactly is motivating authors to contribute and what is preventing them from doing so?
According to a 2014 publication, Dubinsky examined the growth rate of IRs operating on the Digital Commons platform. Dubinsky found that the lack of faculty participation can be attributed to copyright concerns, preference to deposit in a disciplinary repository, difficult submission process, fears of plagiarism, and a general lack of awareness of the IR [4]. A survey of Texas A&M University (TAMU) faculty by Yang and Li in 2015, also found that faculty members had a significant lack of awareness about the IR and concerns regarding copyright [5]. The authors did not specify the kinds of copyright concerns that the faculty had. They determined that the IR benefited from much more focused promotion, although there seemed to be an overall lack of knowledge regarding the existence of library resources on campus [5]. As we saw with Royster, promoting the IR is not the solution IR managers want it to be, but there was a greater response to reviewing copyright permissions before reaching out to faculty members.
There is also evidence that faculty members perceive pre-print and post-print versions of articles to be of lower quality than that of the final PDF files posted by publishers [5]. The posting of an article in a .docx file format may appear incomplete; however, the only differences between a post-print and a publisher PDF files should lie in the typesetting, nothing more. So, if the content is peer-reviewed and the content is no different from what the publisher is distributing, then is the objection merely in the appearance of the document? Yang and Li do not go into detail about why faculty members think that these versions are of lower quality, merely that OA journals are seen by many faculty members as less prestigious and of lower quality, as are the post-prints deposited in the IR [5]. They also collected responses from faculty members regarding their opinions about TAMU possibly adopting an OA mandate. Despite how widely adopted OA policies are, TAMU faculty are skeptical of the value of an OA policy for obtaining grant funding or making their research more visible and some are intensely opposed, believing it is in violation of their academic freedom [5]. FSU’s OA policy passed unanimously, but it did include a waiver for faculty members wanting the choice to opt out of the non-exclusive license. Only two submitted a waiver and waiver submissions are still available for submission on FSU’s OA policy website.
It seems that the value of a repository and OA ideals are not quite enough to significantly increase faculty participation, so what workflows are librarians creating to address faculty reluctance to submit their work, their concerns regarding copyright, but also educating faculty on the benefits and the existence of the IR? In 2011, Hanlon and Ramirez published the results of a survey targeting IR managers to identify copyright clearance trends in staffing and workflows. Most respondents used a mediated submissions model and the librarians checked copyright permissions instead of faculty [6]. Kansas State University (KSU) reported checking publisher permissions with SHERPA/RoMEO, an online resource of publisher OA policies, or the publisher website if no entry for the journal existed in SHERPA [7]. Interestingly, despite how many respondents relied on SHERPA or existing copyright directories, most did not share the copyright information they found with other institutions by submitting updates or entries to community-driven services like SHERPA [6]. It seems that many institutions depend heavily on SHERPA/RoMEO for copyright clearance [3,6,7]. Our own workflow would not be possible without SHERPA’s API.
In 2013, Madsen and Oleen share their original outreach workflow for which the IR manager is responsible. In summary, the IR managers contact faculty members about the benefits of the IR, check publisher permissions with SHERPA/RoMEO, obtain the manuscript from faculty, create a metadata record and coverpage for the manuscript, and reply to faculty with the repository URL [7]. KSU eventually evolve their workflow with the development of a workflow management system with a local web client used to track submissions and create metadata from citations downloaded from external databases [7]. Their workflow is very similar to the content requirement workflow I will be describing. Due to lack of funding and minimal staffing, libraries are essentially forced to creatively find solutions and, in some cases, build their own systems.

2. Methods

The methods described in this report utilize indexing services like Web of Science (WoS) or Scopus to gather citation data. Other tools such as Zotero, Google Sheets, and Openrefine are free software and some are open source. In 2016, FSU Library staff developed and implemented a green open-access plan to reach out to authors for post-prints to deposit into the repository in an effort to make research output more freely available. With commercial solutions for aggregating author content not a financially viable option, the library staff are in a position to offer creative, albeit patchwork, solutions.

2.1. Campus Support

In February of 2016, the Florida State University (FSU) faculty senate unanimously adopted an Open Access (OA) policy granting the university a nonexclusive license to preserve a copy of published scholarly output in the institutional repository, making it accessible to the public and greater research community. Following the adoption of the OA policy, the Scholarly Communication Librarian and a former graduate student were tasked with implementing outreach strategies and preservation workflows designed to provide green open access to published research of the FSU community. The Open Access Advisory Board, consisting of 10–15 faculty members from various departments, supports the Scholarly Communication Librarian and the Repository Specialist by providing feedback on the outreach initiatives.

2.2. Developing Outreach Implementation Plans

In an effort to address low faculty self-submission, members of the Digital Research and Scholarship (DRS) office researched methods of gathering faculty scholarship in a broad, ideally automated way. Commercial solutions that offer data collection, systems integration, and reporting services were considered. A representative for Symplectic Elements met with FSU librarians about their software and services; however, the final quote for the starting price of the service package was far more than the library was able to pay. Commercial solutions were increasingly out of reach.
The idea for our current workflow originated via a post on ALA’s Scholcomm listserv. The original posters were implementing a mediated submission process and were looking for advice on how to check copyright on articles in batch. Someone from Loyola Marymount University replied to the post detailing their method of checking for copyright permissions using the SHERPA/RoMEO API and automating the results in batches by using Google Sheets. They supplied a link to the template and our future workflow for automated harvesting incorporated that process of evaluating copyright permissions as a step. This addresses another issue faced by repository managers: The unwieldy task of checking changing copyright permission policies of journals and their publishers to determine which version of the article can be archived in the institutional repository [8]. Code4Lib published an article in 2013, titled “Using XSLT and Google Scripts to Streamline Populating an Institutional Repository”. This article described the creation of two scripts: (1) Exporting citations as XML files and then transforming the records into Dublin Core for batch loading into DSpace; (2) using the SHERPA/RoMEO API to identify journal copyright permissions and display the results in Google Sheets [9]. Our team expanded these methods to develop a workflow that would:
  • Identify Florida State affiliated publications;
  • Develop a submission process that would require minimal work from authors;
  • Decrease time intensive tasks for the repository manager;
  • Increase author submission rates;
  • Increase the number of unique author submissions.

2.3. Getting Creative with Workflows

This section is a step-by-step description of the outreach workflow, but it does not dive into technical depths. Our team built upon the methods shared by Loyola Marymount University, and developed a workflow that identifies thousands of FSU-affiliated publications and exports metadata from Web of Science, verifies copyright, normalizes and packages metadata, and executes transformations into Metadata Object Description Schema (MODS) records for batch-uploading into the repository. There are three stages to this workflow: (1) Collecting metadata from Web of Science and using that data to create MODS records; (2) reaching out to authors to upload the accepted manuscripts; and (3) uploading the OA articles we identified in Stage 1, detailed in Figure 1. (The workflow toolkit can be found online at www.doi.org/10.17605/OSF.IO/4MZ7Y).
To begin Stage 1, articles are identified by organizational affiliation and date range. For example, “ORGANIZATION-ENHANCED: (Florida State University) AND YEAR PUBLISHED: (2018)” produces 3105 results, of which 2461 results are the document type of “article”. We chose not to filter by document type. Once it is determined the results reflect the intended search, we save the full record using EndNote desktop. Take note that only 500 records at a time can be downloaded from Web of Science at one time. The files are then merged into one dataset using Zotero because the program can read EndNote file extensions. We chose to use EndNote because tab-delaminated files produced by WoS do not include the detailed attributes we use to create the MODS xml elements; it would require more work to cross-map the metadata.
The merged data set is exported from Zotero as a comma separated value (CSV) file and then uploaded as a Google Sheet document. Google provides a script editor that allows the creation of functions to make calls to SHERPA/RoMEO’s API and fill in the selected cells with the desired value. The purpose of this process is to identify the self-archiving permissions of publishers for a large volume of records. This saves a significant amount of time for the repository manager and makes the entire workflow possible. If everything is working properly, this is still the part of the workflow that takes the most time to complete. It took around two hours to make all the API calls for 600 records. To avoid issues of timing out, it is advised to only call the API for 100 cells at a time. It is also more manageable to keep up with the added responsibility of copy/pasting the results as values, otherwise if the sheet is closed and reopened without doing this, it will call the API for all cells containing the call function and the sheet will crash.
When the permissions checks are completed, it is time to create the MODS records. The result set is exported as an excel file, which in our experience uploads better in Open Refine than a CSV for this particular workflow. The data is prepared in Open Refine, internal IDs (IIDs) are created and changes are made to the column names to correspond with MODS so the PHP script can transform the schema. The finished product is exported as a JSON file and then the PHP script reads the JSON file data and generates a MODS record for each item. The result set is then uploaded into Google Drive for storage and ease of access.

2.4. Reaching Out to Authors

Stage 2, the author manuscript solicitation workflow, branches off from the initial data collection from Web of Science. Using Microsoft Outlook, Excel, and Word, we are able to execute a Mail Merge by using our gathered metadata as our data source. Unfortunately, there is currently no way to automate the process of filling the FSU author contact information into the data spreadsheet, so this process is done manually. In Spring 2018 and Fall 2018, we employed undergraduate student workers to complete the task of locating current author emails and then entering the data into the spreadsheet.
The initial email template that was used contained blocks of text describing the Open Access policy and the benefits of uploading articles in an open access repository. Authors who chose to submit their article need only to reply with the document attached and we created the metadata through a mediated submission process. Responses from authors started strong but began to dwindle, so the message was redesigned to a shorter, more direct message for authors to upload their content to a minimal submission form. Describing what a “post-print” is in comparison with the publisher pdf is always a challenge. The term “accepted author manuscript” seems better received in conversations with faculty members, however, it is still helpful to remind submitters that these accepted versions do not contain publisher branding, unless it is an OA article. It is not clear if the redesigned email made any impact on the minds of authors. Responses are consistent, but they are still low.

2.5. Workflow Manager

At first, student workers used Trello to track author submissions. Authors would attach their manuscripts to emails and the student workers downloaded them and uploaded the manuscripts to Trello cards. The cards carried the various versions and packages that are created while readying the document and metadata files for deposit into DigiNole. Currently, there is more than one way to submit to DigiNole in this workflow. We created a minimal submission form for authors to upload their manuscripts and provide departmental information. The added benefit this provides is a custom script our repository developer wrote using the Drupal 7 Rules module. This script executes whenever a node of the type “Scholarship Submission” is saved with the status “Ingested”. The information provided in the fields of the form are logged as rows in a CSV. This is helpful for reporting as well as archiving submission information as Drupal nodes are deleted.

2.6. Including Open Access Articles

Articles that are published via open access are also included in the initial content download from WoS at the beginning of the workflow. These OA articles are separated from non-OA articles to another spreadsheet after the permissions process is completed. Student workers are given the task of collecting the open access articles and pairing them with a MODS record for deposit into the repository. The primary purpose of this outreach program is to provide access to research that is otherwise behind a paywall. Gathering articles that are already openly accessible is seen as a form of preservation and is not included as an indicator of success.

3. Results

At the time the presentation was delivered at Open Repositories 2018 in June, 712 articles were uploaded to the repository since the outreach method was first implemented in April of 2017. From June of 2018 until March 2019, 474 articles were added to the repository, for a total of 1186. Due to DigiNole’s reporting limitations, I am unable to pull exact data counts for items deposited monthly or items created during a particular year. Limitations will be explained in the Discussion section. Figure 2 rather dramatically visualizes the inconsistent results of article deposit using the Web of Science workflow. Of the total articles that faculty members published in 2016 (indexed by Web of Science), 511 of them are deposited using this workflow and we are still collecting articles for 2018 from authors (currently at 471). According to Figure 2, it might read that only 204 of the articles published in 2017 were “grabbed” by the harvesting tool, but this is misleading. There are at least 400 open access articles identified for the year of 2017 via the Web of Science workflow, however, many of those articles overlapped with other harvesting workflows we ran that year for the Department of Biomedical Sciences, College of Human Sciences, and the Florida Learning Disability Research Center. Less than 9% of these articles are post-prints delivered in response to outreach emails.
Over the course of this workflow, a total of 5448 well-formed, valid MODS records were created from the metadata downloaded from Web of Science. Only 21.7% of the metadata records created are packaged with an article and deposited in the repository. This leaves 78.2% of the metadata records sitting in storage, waiting to be paired with a post-print. As seen in Figure 3, in 2016, 1482 MODS records were created, 1874 records were created in 2017, and 2092 records were created for the year 2018. Web of Science does not capture the full picture of faculty-publishing behavior or the exact number of journal articles published every year. The increasing numbers in metadata record creation could be considered evidence of increasing FSU author publications per year. There is a possibility that Web of Science is indexing more journals each year, so it is capturing more FSU-affiliated publications.
Approximately 2000 emails to date have been sent to authors using the Mail Merge. Batches of 200–350 are emailed about every other month and recently we started sending a reminder email 2–3 weeks following the first email to those who did not respond to the first round. It is unclear if the reminder emails are effective. Unlike Royster, even with a copyright pre-approval and mediated submission process, FSU Libraries have not been fortunate with author responses. Roughly 5% of the emails in each round of mail merges result in a response and half of those responses are the publisher PDF version. After following up with those who submit a publisher PDF, a third will reply with the accepted manuscript version. Considering the low email response rate, it is important to question the effectiveness of the outreach workflow.
The majority of authors who do respond with articles are unique submitters, meaning they have not previously self-submitted to the repository before. There are cases of these emails successfully functioning as an outreach tool for promoting the services of the IR. In one case, a faculty member responded asking if they could upload all of their other articles. This resulted in the deposit of 22 accepted manuscripts to the repository and 59 technical reports for an affiliated research center, the Center for the Study of Technology in Counseling and Career Development.
The total number of deposits to the IR, as seen in Figure 4, have increased steadily from 2015 until 2018. From 2016–2018, outreach workflows were developed and implemented, likely causing the jump from 429 total objects in 2015 to 810 in 2016.

4. Discussion

Early in the workflow development process, overall tool functionality and increasing unique author submissions were identified as the two primary indicators of success. Overall, the tool functions as expected, although Stage 1 requires a lot of time to complete due to the need for human mediation. All the applications, tools, and file formats involved in this workflow are currently supported by various communities and organizations. The PHP scripts written to generate MODS records from JSON files are supported by FSU Libraries’ web development team. Despite the low author participation rate, the majority of respondents are unique submitters. There is also an advantage to operating a mediated submission process, such as quality control over metadata and files uploaded. Although we do depend on Web of Science for quality, there are few known issues with the metadata we pull. During the MODS generation stage of the process, the program flags errors in the metadata and the IR manager manually locates the file and fixes the error. Out of thousands of records every annual batch, I get 10–15 records with errors.

4.1. Limitations

As the workflow currently stands, we have limited funds to support not just software, but also staffing. The current goal is to find funding for a part-time position to assist with the outreach workflows. We rely heavily on student workers to process the bulk of the content. We could likely continue employing student workers to read through email replies and upload ingest packages for deposit, however, the rest of the workflow would need to be made as fully automated as possible.
There are also limitations to using the Islandora 7 platform. There is no robust reporting framework for the IR. Reports consist of object counts for collections for specific dates and technically Solr searches can be downloaded as CSV files. These files contain a limited amount of metadata and with the way it is currently configured by FLVC, the Florida consortia organization that hosts DigiNole, Solr runs into errors when confronted with fields containing a large number of authors when generating the CSV file, halting the whole process. Currently, there is very little technical support for Islandora and there is no plan to continue developing the platform on part of FLVC. A reporting framework for the outreach workflow will need to exist outside of DigiNole and with so many different applications involved, this is not easy to do. Ideally, I would like a more closed system for the workflow, similar to what is described by Madsen and Oleen and Morrow and Mower [7,10].

4.2. Future Directions

Developing a reporting framework is the dream of this IR manager. Last June, one of our web developers created an archive log for Drupal nodes as author self-submissions (and some mediated submissions) are processed for deposit in the IR. The rows of the archive log are populated with metadata fields supplied from submission forms as each form is marked as “Ingested”. Hopefully, in a few years there will be enough data to see meaningful deposit trends for submissions. Google sheets are not an effective means of tracking submissions through the outreach workflow. It is time-consuming and humans are forgetful, even diligent ones. Creating a system for tracking emails, submissions, and deposits would make the Web of Science outreach workflow more manageable.
As for recruiting content for the IR and developing new outreach goals, it may be productive to reach out to emeritus faculty and those seeking to solidify their legacy at FSU. There is also an effort being made to support and publish gray literature produced by the FSU community with the IR. The goal of summer 2019 is to create a comprehensive communications plan that will complement the work that is currently being done with the Web of Science outreach workflow. Currently, there is no official communications plan for the IR and one did not exist prior to our migration from bepress in 2015. Discussions of uploading metadata-only records to the IR are being revived. There are mixed feelings on the utility of metadata only records in the repository if the intention of the system is to provide access to a publication or object of some kind. There are possibilities of integrating our IR platform with the faculty advancement software operated by the graduate school, so there may be utility of using the records to populate faculty profiles.

5. Conclusions

The author post-print submission rates are low; however, there is an upward trend of repository content growth since the beginning of the workflow implementation in 2016. Ironically, the methods described in this report can be replicated if institutions have paid access to journal indexing services like Web of Science (WoS) or Scopus to gather citation data. There are decisions to be made moving forward with the workflow regarding effective changes and creating a communications plan. There is also need for a more robust reporting system to track submissions. Ultimately, librarians can survey and build repositories constantly, but as long as publishers have the final word on green open access and until manuscript submission is rewarded in promotion and tenure review, these rates are unlikely to increase.

Funding

The APC was funded by Florida State University (FSU). FSU is a member of the MDPI institutional access program and entitles FSU affiliated authors to a 25% discount on all APCs. The remainder of the APC will be covered by FSU’s Open Access Publishing Fund.

Acknowledgments

I want to thank Aaron Retteen and Devin Soper for their work developing the outreach program, and our student workers for providing operational support.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Brown, B.J.; Soper, D. Migrating To An Open Source Institutional Repository. 2016. Available online: http://purl.flvc.org/fsu/fd/FSU_libsubv1_scholarship_submission_1462290278 (accessed on 24 February 2019).
  2. Lynch, C.A. Institutional repositories: Essential infrastructure for scholarship in the digital age. Portal Libr. Acad. 2003, 3, 327–336. [Google Scholar] [CrossRef]
  3. Royster, P. How to Fill Your Institutional Repository, or, Practical Lessons I Learned by Doing. American Library Association Convention: Anaheim, CA, USA, 2008. Available online: http://digitalcommons.unl.edu/library_talks/40/ (accessed on 2 March 2019).
  4. Dubinsky, E. A Current Snapshot of Institutional Repositories: Growth Rate, Disciplinary Content and Faculty Contributions. J. Librariansh. Sch. Commun. 2014, 2, eP1167. [Google Scholar] [CrossRef]
  5. Yang, Z.; Li, Y. University Faculty Awareness and Attitudes towards Open Access Publishing and the Institutional Repository: A Case Study. J. Librariansh. Sch. Commun. 2015, 3, eP1210. [Google Scholar] [CrossRef]
  6. Abrizah, A. The cautious faculty: Their awareness and attitudes towards institutional repositories. Malays. J. Libr. Inf. Sci. 2017, 14, 17–37. [Google Scholar]
  7. Madsen, D.L.; Oleen, J.K. Staffing and Workflow of a Maturing Institutional Repository. J. Librariansh. Sch. Commun. 2013, 1, eP1063. [Google Scholar] [CrossRef]
  8. Kingsley, D. Walking in Quicksand—Keeping up with Copyright Agreements. Australasian Open Access Strategy Group, 2013. Available online: https://aoasg.org.au/2013/05/23/walking-in-quicksand-keeping-up-with-copyright-agreements/ (accessed on 24 February 2019).
  9. Flynn, S.X.; Oyler, C.; Miles, M. Using XSLT and Google Scripts to Streamline Populating an Institutional Repository. Code{4}lib J. 2013, 19. Available online: http://journal.code4lib.org/articles/7825 (accessed on 3 March 2019).
  10. Morrow, A.; Mower, A. University Scholarly Knowledge Inventory System: A workflow system for institutional repositories. Cat. Classif. Q. 2009, 47, 286–296. [Google Scholar] [CrossRef]
Figure 1. The order of the workflow and the several file conversions that happen along the way.
Figure 1. The order of the workflow and the several file conversions that happen along the way.
Publications 07 00037 g001
Figure 2. Deposit counts per year of articles gathered in the Web of Science workflow. Both open access articles and submitted author manuscripts are combined in the total. 511 articles published in 2016 were deposited and 2017 resulted in the least (204). We are still collecting articles for 2018.
Figure 2. Deposit counts per year of articles gathered in the Web of Science workflow. Both open access articles and submitted author manuscripts are combined in the total. 511 articles published in 2016 were deposited and 2017 resulted in the least (204). We are still collecting articles for 2018.
Publications 07 00037 g002
Figure 3. The increasing number of metadata records created for each annual batch of data downloaded from Web of Science.
Figure 3. The increasing number of metadata records created for each annual batch of data downloaded from Web of Science.
Publications 07 00037 g003
Figure 4. Growth of total items in the repository by year. Theses and dissertation objects were removed from the total count.
Figure 4. Growth of total items in the repository by year. Theses and dissertation objects were removed from the total count.
Publications 07 00037 g004

Share and Cite

MDPI and ACS Style

Smart, R. What Is an Institutional Repository to Do? Implementing Open Access Harvesting Workflows. Publications 2019, 7, 37. https://doi.org/10.3390/publications7020037

AMA Style

Smart R. What Is an Institutional Repository to Do? Implementing Open Access Harvesting Workflows. Publications. 2019; 7(2):37. https://doi.org/10.3390/publications7020037

Chicago/Turabian Style

Smart, Rachel. 2019. "What Is an Institutional Repository to Do? Implementing Open Access Harvesting Workflows" Publications 7, no. 2: 37. https://doi.org/10.3390/publications7020037

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop