Toward Easy Deposit: Lowering the Barriers of Green Open Access with Data Integration and Automation

: This article describes the design and development of an interoperable application that supports green open access with long-term sustainability and improved user experience of article deposit. The lack of library resources and the unfriendly repository user interface are two significant barriers that hinder green open access. Tasked to implement the open access mandate, librarians at an American research university developed a comprehensive system called Easy Deposit 2 to automate the support workflow of green open access. Easy Deposit 2 is a web application that is able to harvest new publications, to source manuscripts on behalf of the library, and to facilitate self-archiving to a university’s institutional repository. The article deposit rate increased from 7.40% to 25.60% with the launch of Easy Deposit 2. The results show that a computer system can implement routine tasks to support green open access with success. Recent developments in digital repository provide new opportunities for innovation, such as Easy Deposit 2, in supporting open access. Academic librarians are vital in promoting "openness" in scholarly communication, such as transparency and diversity in the sharing of publication data.


Introduction
A combination of several major developments has caused a surge of interest in open access (OA) in recent years.First, more funding agencies, such as the US National Institutes of Health and the US National Science Foundation, have made OA publishing mandatory for grantees [1,2].Moreover, academic libraries are considering OA as a potential solution to maintaining their access to scientific and research literature [3].
OA is a newer form of scholarly publishing, which refers to scholarly literature that is free to read online [4].Scholarly publishing nowadays is complex, and researchers have made attempts to classify OA literature into subtypes based on factors such as the source and the license of re-use [5][6][7].The focus of this article is green OA, the practice of authors depositing their post-print (peerreviewed) manuscripts into a university's institutional repository for free public access.Green OA, or self-archiving, is usually the last option for authors if they cannot afford the article processing charges to publish in a gold or hybrid OA journal [8].However, despite the benefits, recent studies find that only 10% of the scholarly articles were self-archived by faculty authors, even including those at institutions with robust institutional repositories and OA policies [9,10].Several barriers can be blamed for the small percentage of green OA.The early study attributes the lack of author selfarchiving to inadequate marketing and the effort required for manuscript deposit [11], while more recent studies suggest that enforcement of the OA mandate from the University and funders is required in order to see more voluntarily manuscript depositing from the faculty [12,13].

Support for Open Access at Oregon State University Libraries
The faculty members at Oregon State University (OSU), a public research university in the US, passed an OA policy in 2013 that mandates every faculty member to grant OSU permission to make available his or her scholarly articles and to reproduce and distribute those articles for open access.OSU Libraries (OSUL) are in charge of implementing the OA policy by promoting the policy to faculty members and depositing accepted manuscripts (post-peer review, pre-typeset) of their articles to OSU's institutional repository (ScholarsArchive@OSU), also managed by the library [14].An early workflow to implement the university's OA mandate required human intervention at every step.First, a librarian created a search alert in the Web of Science that found all of the articles authored or co-authored by the OSU faculty, and the results were sent to a dedicated OSU email address.A library staff member was assigned as the owner of that email account and coordinated the OA workflow.For each article listed in the Web of Science search alert, library staff took the following actions: 1. Searched the SHERPA/RoMEO1 , an online database that collects publisher open access policies, to obtain the corresponding publisher's policies on copyright.2. Found the email address of the contact author from the search alert.3. Contacted the author to ask them to archive the article manuscript into ScholarsArchive@OSU. 4. If the author replied with the manuscript in an attachment, the library staff deposited it into ScholarsArchive@OSU on behalf of the author.This manual workflow produced a high deposit rate, defined as the number of deposited articles divided by the total articles authored by OSU faculty, around 44% between 2013 and 2015 [15].However, OSUL had to abandon this workflow because it could no longer commit a full-time staff to OA alone due to limited resources.The faculty feedback suggested that the main barrier for selfarchiving was the time and effort required to deposit articles into the institutional repository.For example, the faculty complained of the unfriendly repository user interface and the number of required metadata fields.
To provide sustainable and long-term support for green OA, several librarians and staff members at OSUL initiated a project that would not only automate many steps of the OA support workflow but also make article self-archiving as easy as clicking a button.The result of the aforementioned project was Easy Deposit 22 , a web application that harvests journal articles, automates OA promotion and outreach, and supports easy deposit into ScholarsArchive@OSU.This study of Easy Deposit 2 tried to answer the research question of whether a computer system could substitute for library staffing to support green OA.The rest of the article is outlined as follows: • Methodology: A description of the technical details of automation at each main step of the OA support workflow and how to integrate all of the steps under Easy Deposit 2.

•
Results and Discussion: A comparison of the numbers of article deposits in different periods (early workflow, with Easy Deposit 2 support, and faculty self-archiving) and a review of the lessons learned by supporting green OA with this new approach.

•
Conclusion: An introduction to the future plan and an explanation as to why green OA is still valuable with the growth of other OA publishing models.This paper demonstrates how the library can effectively utilize the recent digital library developments to achieve sustainable support for the green OA of scholarly publications.

System Design of Easy Deposit 2
Easy Deposit 2 (ED2) was developed as a reboot of OSUL's support for green OA with automation.We named the proposed system Easy Deposit 2 because EasyDeposit had already been used as the name of another open-source toolkit 3 .ED2 is a comprehensive system developed using the web application framework called Ruby on Rails 4 .It contains a MySQL5 database as the backend storage, a deposit interface, and a dozen scripts created for a variety of functions.The system searches the publisher's database for faculty publications and harvests their metadata records.The harvested metadata records are saved in ED2's database and processed for creating the manuscripts recruiting emails.ED2 first looks for and uses the email address of the corresponding author in the metadata records.If the contact information is not available from public records, ED2 then searches OSU's electronic faculty directory for an email address with the author's name.For every author with a detected email address, ED2 sends out a message asking for self-archiving of the article manuscript into ScholarsArchive@OSU.The authors can use the deposit link embedded in the email to upload and submit their manuscripts to OSU's institutional repository.The functions of ED2 automate most of the equivalent activities in the OA workflow that were previously carried out by library staff.The three modules of ED2 and their corresponding functions are summarized in Table 1 with key technical details such as the conditions that trigger workflow activities and data collection and integration from a variety of resources.

Functions Supported Key Technical Details
Fetch Module

Harvesting faculty authored articles
The module queries the Web of Science database via API daily.Any newly published articles in the past four weeks by the university's faculty will trigger the OA workflow to recruit manuscripts.

Parsing metadata
The module parses article metadata from harvested records and saves parsed metadata records in the local database.

Parsing and Email Module Recruiting article manuscript
The module looks for author email addresses using either publisher's application programming interface (API) or university's directory service; sends out manuscript recruiting email to authors with all found email addresses.

Deposit article manuscript into OSU's institutional repository
The author clicks the deposit link embedded in the article recruiting email to initiate the self-archiving, then uploads and ingests the manuscripts to ScholarsArchive@OSU using ED2's deposit page.
The diagram in Figure 1 illustrates the functions and modules of ED2 and how ED2 interacts with external entities such as Web of Science, the authors, and the institutional repository.The only step in the previous workflow no longer covered by ED2 is verifying the publisher's self-archiving policies.We consulted the General Counsel of the University about whether that verification is necessary in light of the University's Open Access policy.The policy grants the University (and faculty) a nonexclusive license to distribute, at the very least, the accepted manuscript version of their articles.The counsel informed us that allowing faculty to deposit their articles without library mediation would protect the library from liability.

Recruiting and Depositing Manuscripts with Easy Deposit 2
ED2 uses the Web of Science API 6 to retrieve from the Web of Science core collections journal articles published by faculty members in the last four weeks.The extended API returns the found articles in XML with metadata fields such as the title, author, abstract, and contact email of the corresponding author.ED2 parses and saves the metadata for each article in the local database, together with the "author publication pair," which is a data structure that stores the article identifier and the author's email.The primary purpose for creating the author publication pair is to retrieve the correct metadata record during the article depositing process.If an article has several corresponding authors, ED2 will generate an author publication pair for every author.However, since April 2019, the email of the corresponding author is no longer available from the Web of Science API.The impact of relying on the commercial APIs for the future of ED2 is examined in the Discussion section.
The email sent to authors for recruiting manuscripts is generated using a template where contents such as article title and deposit link are pulled from the ED2 database.It also includes links to the OSU's OA policy and contact information for ScholarsArchive@OSU. Interested readers can find the template of the manuscript's recruiting email attached as a supplementary file of this article.The deposit link embedded in the article-recruiting email includes a "key" generated by the email of the contact author in the SHA-2 7 (Secure Hash Algorithm 2) cryptographic hashing standard.When the faculty member clicks that link, ED2 compares the SHA2 key with the author's email stored in its database for authentication.For self-archiving manuscripts into ScholarsArchive@OSU, an author only needs to interact with ED2's article deposit page without the need to create an account for upload.Figure 2 shows the actual look-and-feel of ED2's deposit page as a screenshot taken with a randomly selected article for demonstration.Metadata fields such as title and abstract are auto-populated with values from ED2's database; the author only needs to upload the manuscript and submit it for self-archiving.ED2 packs the uploaded file and its metadata in the JSON (JavaScript Object Notation) format and ingests the package into ScholarsArchive@OSU over the web as a HTTP (Hypertext Transfer Protocol) request.ED2 has a dashboard for system administrators and librarians, providing information such as the total number of journal articles that have been harvested and how many of them have been selfarchived by faculty members.Access to the administration dashboard is restricted to OSUL's librarians and staff members.

Results
The effect of ED2 on green OA was evaluated by the article deposit rate, which is a widely used metric by the OA community.In this study, the article deposit rate is defined as the number of articles deposited into ScholarsArchive@OSU divided by the total number of articles published by OSU faculty members.The deposit rates shown in Table 2 were calculated over four phases:

•
Pre-WoS: The period before OSUL implemented the manual workflow using the Web of Science (WoS) index.

•
WoS manual: The period when OSUL committed one full-time member of staff to recruit and deposit manuscripts on behalf of the faculty.

•
Period in-between: The period in between cession of the manual OA workflow and the launch of ED2.

•
ED2 OA: The period when OSUL used ED2 to support OA. 7 SHA-2 key: https://ruby-doc.org/stdlib-2.4.0/libdoc/digest/rdoc/Digest/SHA2.htmlTable 2.The trend of article deposit rates along with four phases of open access support between the period of 2011 and 2018: the period before OSUL implemented the manual workflow using the Web of Science index (Pre-Wos), OSUL committed one full-time member of staff to recruit and deposit manuscripts on behalf of the faculty (WoS manual), the period in between cession of the manual OA workflow and the launch of ED2 (Period in-between), and the period when OSUL used ED2 to support OA (ED2 OA).The numbers in Table 2 are obtained by searching the underlying database with appropriate metadata fields.For the number of "Articles Deposited," the author searched ScholarsArchive@OSU with the resource type as "Article," and then separated the results by year with values in the "Date Created" field.For the number of "Total Articles," the author first submitted an address search to the WoS with the query of "AD = 'Oregon State University'," and then used the "Year" facet to estimate the total number of articles published by the faculty.The article deposit rate tripled, from 7.40% to 25.60%, compared to the rate of the previous three years (between 2015 and 2018) since the launch of ED2.However, the article deposit rate with ED2 is about 19 percentage points lower (25.60% vs. 44.07%)than it was between 2012 and 2014 when green OA was supported by a fulltime library staff.

Pre-WoS
The reported green OA deposit rate for published journal articles is about 12%, with the considerable difference among research fields [16].For instance, the green OA deposit rate for journal articles in the field of Library and Information Science ranged from 20% to 31% between 2012 and 2016 [17].Overall, the green OA article deposit rate for most research fields appeared to be between 10% to 30%.

Discussion
Previous studies show that without an outreach and matching repository service, few faculties would voluntarily deposit manuscripts into an institutional repository, even if this was a university mandate [11,12,15].At the author's university, the article deposit rate for the most successful green OA was about 44% when a full-time staff was responsible for recruiting manuscripts and depositing them on the faculty's behalf.Most academic libraries in the US cannot afford to have a full-time staff member dedicated to supporting self-archiving.The research question of this study is whether a computer system could replace library staffing for the support of green OA.The results show that ED2 can significantly increase the number of authors who would voluntarily deposit their manuscripts into the OSU's institutional repository.The article deposit rate with ED2 was 25.60%, which is toward the top of reported green OA article deposit rates for most research fields.However, the success of green OA requires more than library support.Parties with influence and power, such as tenure and promotion committees and funding agencies, should enhance the OA mandate in their policies.
The ED2 system was developed as an extension of OSUL's repository service and was integrated into OSU's institutional repository.There are existing efforts to harvest publications using the Web of Science API [18].The Bibliographic Management System 8 , developed by the Stanford University Library, inspired us to initiate the ED2 project.ED2 is an innovation in that it is the first comprehensive system specially developed to support green OA.It is the article deposit portal of ScholarsArchive@OSU, the software agent that automates the workflow of green OA, and the database of faculty publications with their OA status.A librarian can use ED2's dashboard to answer important questions such as how many articles faculty members have published in the last year and how many of them have been self-archived into ScholarsArchive@OSU.
Access to metadata records of scholarly publications has become the bottleneck of running ED2.The author email is vital for ED2 because it is required for the outreach, the recruiting of the manuscripts, and the authentication for deposit.ED2 was able to pull out the corresponding author email(s) from the Web of Science API until April 2019, when the company decided to exclude author email(s) from the API output.An alternative solution was developed to obtain author emails by looking for author names in OSU's staff directory.However, for all of the articles harvested by ED2, fewer than 50% are found to have at least one current faculty author.The lesson learned here is that the library cannot rely on a single and commercial source for data, because the library has little influence over vendor decisions, and the priorities of both parties are not aligned in many situations.Several popular APIs-Elsevier 9 , Crossref 10 , and ORCID 11 , have been tested to determine whether they could replace the API from the Web of Science.The results show that none of the APIs include author contact information, and the availability of metadata, such as the abstract and copyright information in the API output, is inconsistent.A long-term solution for the problem of data accessibility may be that non-profit organizations, such as Crossref and ORCID, collaborate and provide critical information such as copyright and contact details through their APIs.

Conclusions
Recent technological developments in the digital repository and web services provide new opportunities for innovation in supporting OA publishing.Easy Deposit 2 is a comprehensive system developed to prompt OA and to facilitate the author self-archiving.ED2 was designed to substitute dedicated support staff by automating major OA tasks, such as harvesting faculty-published articles, source manuscripts, and article depositing into an institutional repository.ED2 significantly increased the number of article manuscripts voluntarily deposited by faculty members.The article deposit rate to OSU's institutional repository was raised from 7.40% to 25.60% with the support of ED2.However, a computer system, like ED2, does not replace the knowledge, commitment, and value represented by professional librarians and library research support services.
The priority for the future development of ED2 will be to diversify the data sources, with a preference for community-based or non-profit organizations.The proposed methods include harvesting article metadata from Crossref and searching for author contact information in ORCID.Breaking the monopoly of commercial indices on publication data is critical for the long-term sustainability of supporting OA and promoting "openness" in scholarly communication.Academic 8 Bibliographic Management System: https://github.com/sul-dlss/sul_pub 9Elsevier API: https://dev.elsevier.com/ 10Crossref API: https://www.crossref.org/services/metadata-delivery/rest-api/ 11ORCID API: https://orcid.org/organizations/integrators/APIlibrarians have vital roles in a more open and transparent sharing of publication data by collaboration with scientific communities and non-profit publication data aggregators.

Figure 2 .
Figure 2. Demonstration of Easy Deposit 2 (ED2) deposit page with a randomly selected article.

Table 1 .
Three Easy Deposit 2 modules with functions provided and key technical details such as data collection and integration from different resources.