Open Access Article Processing Charges (OA APC) Longitudinal Study 2016 Dataset

This article documents Open access article processing charges (OA APC) Main 2016. This dataset was developed as part of a longitudinal study of the minority (about a third) of the fully open access journals that use the APC business model. APC data for 2016, 2015, 2014, and 2013 are primarily obtained from publishers’ websites, a process that requires analytic skill as many publishers offer a diverse range of pricing options, including multiple currencies and/or differential pricing by article type, length or work involved and/or discounts for author contributions to editing or the society publisher or based on perceived ability to pay. This version of the dataset draws heavily from the work of Walt Crawford, and includes his entire 2011–2015 dataset; in particular Crawford’s work has made it possible to confirm “no publication fee” status for a large number of journals. DOAJ metadata for 2016 and 2014 and a 2010 APC sample provided by Solomon and Björk are part of the dataset. Inclusion of DOAJ metadata and article counts by Crawford and Solomon and Björk provide a basis for studies of factors such as journal size, subject, or country of publication that might be worth testing for correlation with business model and/or APC size. Data Set: http://dx.doi.org/10.5683/SP/KC2NBV Data Set License: there is no specific license. See the article for details.


Summary
This article describes an update and expansion of the preliminary 2015 dataset described in Data [1,2]. This dataset includes information on open access journals derived from publisher websites and the Directory of Open Access Journals (DOAJ), developed as the base for a longitudinal study on the open access article processing charges (APC) method used by about a third of open access journals. In the APC business model, a payment is made, by an author, institution, or funding agency, for publishing an article so that the article can be freely available to everyone (open access). This dataset also includes 2015 APC data provided by Crawford [3], 2010 APC data provided by Solomon and Björk [4], a smaller set of pilot project data collected by the research team in 2013, and a fuller set of data collected on APCs by the research team in 2014 and 2015, as well as additional data relating to APC sub-models (e.g., variations in pricing, page versus article charges), analysis of publisher type, and a custom subject analysis. To date, these data were used as the basis for a 2014 DOAJ APC survey [5]. This project, Sustaining the Knowledge Commons (SKC) is funded by Canada's Social Sciences and Humanities Research Council. At present, there is keen interest from research funders, libraries, scholars, and publishers on the economics of transition to open access. This dataset will facilitate and speed up the work of other researchers, and this document describing the data is necessary to understand and analyze the data.

Major Sources for This Dataset
Major sources of data for this dataset include: The primary purpose of this dataset is to support longitudinal study of APC pricing. For this reason, APC data from different years is presented in reverse chronological order and essential information for understanding this data is located near the left hand side of the spreadsheet while additional information for each year in question is located elsewhere in the spreadsheet. In analyzing the data it may be useful to make a copy of the spreadsheet, rearrange columns to facilitate comparison of factors under study, and possibly to eliminate additional information as the spreadsheet is large and can be slow to manipulate. Crawford's dataset is available for download as open data [3]. The following table indicates data types and sources ( Table 1). The following section provides details on what is included in each column. Limitations: open access journals by publishers that were not listed in DOAJ in either 2014 or 2016 are not included. Non-English journals, new journals and journals by new publishers are probably under-represented in this sample. Finding errors (e.g., changes in URL, internet connectivity issues) may have a minor impact on data.  The APC amount is determined by the SKC team as follows:

Explanation of Specific Data
costs associated with the publication that have nothing to do with paying for the service of coordinating peer review and publishing the article per se are not included, e.g., subscription rates, offprint sales where differential fees are provided for multiple article types, the type "research article" or closest equivalent is selected where differential fees are provided based on different variations the team aims to select the most common APC for local authors, i.e., • regular fee not discounted fee for society members, low/mid income countries • regular fee not including "extras" such as extra pages, editing services, taxes etc. • fee for local authors not the fees given for the rest of the world • Where different currencies are provided, it can be difficult to select a "main" currency; the following criteria are used to select the original currency currency of country of publication first currency listed currency listed in DOAJ (column BJ) currency previously recorded in OA APC project (to facilitate longitudinal comparison)

•
The following table explains in detail the data used in columns D and I in alphabetical order ( Table 2).  Table 3 below for codes. Column O Important limitations in regards to publisher type: In 2015, we conducted more in-depth research on publisher type than in 2014 for larger publishers. For this reason, there are more mixed publisher types. It is possible that mixed types are under-represented due to limitations in our analysis. That is, for larger commercial publishers, we assume all journals are commercial, but in some cases, it takes in-depth reading about each journal to accurately identify whether a partnership is involved (Tables 3 and 4). Indication that a waiver may be considered in case of hardship (other than medium and low income countries).

EF FC Differential pricing for local authors
Whenever different pricing is given for authors in a particular region. In some cases "local" is assumed.

EG FD
Waivers/discounts for low/medium income countries A common discount, often based on World Bank country classifications.

EH FE
Waivers/discounts based on contributions of work to journal (editing/reviewing) "Based on contributions" is an assumption. Generally discounts refer to discounts for editors, reviewers, etc. • Columns P-AJ contain historical APC and APPC data. Columns P-U (2015); V-AA (2014) and AB-AG (2013) contain data for the years indicated gathered by the SKC team; see corresponding codes above for columns D-M to interpret. • Columns AH-AJ contain historical data from 2010 contributed to the project by Solomon and Björk (2012). Note that Solomon and Björk's methods are slightly different from the SKC project.
Their study was a random sample limited to APC charging journals in DOAJ at the time of their study. Their sample was random in nature and does not necessarily reflect all the titles by any given publisher that would have had APCs at that point in time. Solomon and Björk estimated per-article costs for journals using the APPC method based on the average number of pages rather than treating as a different model.

Walt Crawford's 2015 Data
• Columns AK to BC are WC's data from Crawford's [3] Gold Open Access Journals 2011-2015. See citation for detailed description of data. In addition to data added to column D (APC data, particularly free journals), WC's data includes counts of articles published in each journal sampled per year, country of publication, and potentially useful notes.

DOAJ Metadata as of 1 February 2016 and 15 May 2014
• Columns BD-DF are metadata downloaded from the DOAJ website as of 1 February 2016, with added information in the column headings to identify the columns as DOAJ columns and indicate the year. Note that DOAJ Journal Title information is found in column B, which also contains title information from publisher websites. To identify the source of data in column B, refer to column BC "Publisher DOAJ 2016 02". If there is data in this column, the journal title in column B is from DOAJ. If there is no data in this column, the journal title comes from the publisher's website.

Historical Qualitative Data
• Columns DY-EU (2015), and EV-FQ (2014) provide qualitative information for journals sampled in this time frame as described below. See also APC and APPC information for these years in columns E-M. Table 4, below, provides a column-by-column explanation of the APC and related information contributed by the research team in 2013, 2014, and 2015.

SKC Article Processing Charges/Article Page Processing Charges and Related Information
V to AA (2014) AB to AG (2013) P to U (2015) Important limitation: Although the list of variations (see Table 4, columns AD to DN) is long, not every variation or even every common variation is included. For example, we did not capture colour charges, which are quite common.

•
Columns FR to FU contain a unique subject analysis developed by the SKC team based on DOAJ metadata from 2014, gathering DOAJ detailed subject headings into larger groupings for purpose of analysis. • Columns FV to GA are from the study of Solomon and Björk (2012). See also columns AH-AJ. • Columns GB-GF provide data related to provenance of data in this study, i.e., journals in DOAJ in one of the years sampled but not another, journals with data derived from the publisher's website that were not in DOAJ or in the longitudinal study.

DG to DW: DOAJ 2014 Metadata
Columns DF to DV are from DOAJ 2014 metadata (at the time of the first annual survey), except for titles added in 2015 that are on the publisher's website but not in DOAJ. Some 2015 data were added. See, for example, column GB "in DOAJ 2015 not 2014"-any DOAJ metadata for these titles were taken from the DOAJ 2015 dataset. Note also that our 2014 DOAJ file did not include keywords; any keyword data are from 2015. Titles that were taken from the publisher's website that were not in DOAJ can be identified using the column GD "not in DOAJ 2014 or 2015". Titles that only have information in columns DF and DG (publisher and title) are another indication that the title was on the publisher's website but not in DOAJ (Table 5).

Using These Data (Licensing)
This dataset is derived from several sources, including the DOAJ metadata (which has its own license terms posted on the DOAJ website), other data screen-scraped from DOAJ, factual data gathered from publisher's websites, 2015 data provided by Walt Crawford, 2010 data provided by Solomon and Björk, and our team's analysis. If you are making use of our dataset as a whole, please cite: There is no license for the dataset as a whole, as individual elements are derived from different sources, which may have their own terms. When posting your own dataset, please include at minimum the journal title and ISSN as these are key matching points for merging together different datasets.