Open Access Article Processing Charges (oa Apc) Longitudinal Study 2015 Preliminary Dataset

This article documents Open access article processing charges (OA APC) longitudinal study 2015 preliminary dataset available for download from the OA APC dataverse [1]. This dataset was gathered as part of Sustaining the Knowledge Commons (SKC), a research program funded by Canada's Social Sciences and Humanities Research Council. The overall goal of SKC is to advance our collective knowledge about how to transition scholarly publishing from a system dependent on subscriptions and purchase to one that is fully open access. The OA APC preliminary data 2015 Version 12 dataset was developed as one of the lines of research of SKC, a longitudinal study of the minority (about a third) of the fully open access journals that use this business model. The original idea was to gather data during an annual two-week census period. The volume of data and growth in this area makes this an impractical goal. For this reason, we are posting this preliminary dataset in case it might be helpful to others working in this area. Future data gathering and analyses will be conducted on an ongoing basis. We encourage others to share their data as well. In order to merge datasets, note that the two most critical elements for matching data and merging datasets are the journal title and ISSN. 1. Summary This dataset includes information on open access journals derived from the Directory of Open Access Journals (DOAJ), developed as the base for a longitudinal study on the open access article processing charges (APC) method used by about a third of open access journals. In the APC business model, a payment is made, by an author, institution, or funding agency, for publishing an article so that the article can be freely available to everyone (open access). In addition to DOAJ metadata, this dataset includes 2010 APC data provided by Solomon and Björk [2], a smaller set of pilot project data collected by the research team in 2013, and a fuller set of data collected on APCs by the research team in 2014 and 2015, as well as additional data relating to APC sub-model (e.g., variations in pricing, page versus article charges), analysis of publisher type, problematic (but possible useful) article-level metadata screen scraped from DOAJ, and a custom subject analysis. To date, these data were used as the basis for a 2014 DOAJ APC survey [3]. This project received funding from Canada's Social Sciences and Humanities …

This dataset includes information on open access journals derived from the Directory of Open Access Journals (DOAJ), developed as the base for a longitudinal study on the open access article processing charges (APC) method used by about a third of open access journals.In the APC business model, a payment is made, by an author, institution, or funding agency, for publishing an article so that the article can be freely available to everyone (open access).In addition to DOAJ metadata, this dataset includes 2010 APC data provided by Solomon and Björk [2], a smaller set of pilot project data collected by the research team in 2013, and a fuller set of data collected on APCs by the research team in 2014 and 2015, as well as additional data relating to APC sub-model (e.g., variations in pricing, page versus article charges), analysis of publisher type, problematic (but possible useful) article-level metadata screen scraped from DOAJ, and a custom subject analysis.To date, these data were used as the basis for a 2014 DOAJ APC survey [3].This project received funding from Canada's Social Sciences and Humanities Research Council under the Insight Development Grant program for 2014-2016.In 2016, the dataset will be updated and expanded to include publishers missed in 2015, at which point data analysis and preparation for a new survey article is anticipated.At present, there is keen interest from research funders, libraries, scholars, and publishers on the economics of transition to open access.This dataset will facilitate and speed up the work of other researchers, and this document describing the data is necessary to understand and analyse the data.

Major Sources for this Dataset
Major sources of data for this dataset include: the Directory of Open Access Journals (DOAJ) downloadable metadata; the base set is from May 2014, with some additional data from the 2015 dataset data on publisher article processing charges and related information gathered from publisher websites by the SKC team in 2015, 2014 (Morrison, Salhab, Calvé-Genest and Horava, 2015), and a 2013 pilot DOAJ article content data screen scraped from DOAJ (caution; these data can be quite misleading due to limitations with article-level metadata) Subject analysis based on DOAJ subject metadata in 2014 for selected journals Data on APCs gathered in 2010 by Solomon and Björk [2] (supplied by the authors).Note that Solomon and Björk use a different method of calculating APCs, so the numbers are not directly comparable; please refer to Solomon and Björk [2] for details on their methods.Note that this full dataset includes some working columns, which are meaningful only by means of explaining very specific calculations, which are not necessarily evident in the dataset per se.Details below.

Significant Limitation
This dataset does not include new journals added to the DOAJ in 2015.A recent publisher size analysis indicates some significant changes.For example, DeGruyter, not listed in the 2014 survey, is now the third largest DOAJ publisher with over 200 titles.Elsevier is now the 7 th largest DOAJ publisher.In both cases, gathering data from the publisher websites will be time-consuming as it is necessary in order to conduct individual title look-up.Some OA APC data for newly added journals was gathered in May 2015 but has not yet been added to this dataset.One of the reasons for gathering these data is a comparison of the DOAJ "one price listed" approach with potentially richer data on the publisher's own website.

A to Q: DOAJ Metadata
Columns A to Q are DOAJ metadata, with the exception of titles added in 2015 that are on the publisher's website but not in DOAJ.Most of the DOAJ metadata are from 2014 (at the time of the first annual survey).Some 2015 data were added.See, for example, column DR "in DOAJ 2015 not 2014"-any DOAJ metadata for these titles were taken from the DOAJ 2015 dataset.Note also that our 2014 DOAJ file did not include keywords; any keyword data are from 2015.Titles that were taken from the publisher's website that were not in DOAJ can be identified using the column DT "not in DOAJ 2014 or 2015".Titles that only have information in columns A and B (publisher and title) are another indication that the title was on the publisher's website but not in DOAJ (Table 1).Important limitations in regards to publisher type: In 2015, we conducted more in-depth research on publisher type than in 2014 for larger publishers.For this reason, there are more mixed publisher types.It is possible that mixed types are under-represented due to limitations in our analysis.That is, for larger commercial publishers, we assume all journals are commercial, but in some cases it takes in-depth reading about each journal to accurately identify whether a partnership is involved (Tables 2 and 3).

Publisher size
Publisher size is the number of journals by publisher that have APCs.These data are derived either from the 2014 publisher size analysis, or the 2015 full publisher website list where additional data were taken from the publisher website.These data are used in analysis of APC journal portfolio size and to calculate the sampling factor for small APC journal publishers (less than 10 APC journals) as these are sampled on a random basis.
An important limitation of these data is that they have not been updated to reflect the 2015 DOAJ metadata set.

Publisher type
Determined by members of the SKC team through analysis of publisher's website.Codes are listed below.

Publication charges
Used in 2014 comparison of journals with confirmed publication charges and sample of 100 journals with no publication charges by publisher type.

Sampling factor
Based on publisher size.Used as a correction for calculating overall average APC to reflect sampling of smaller publishers.Important limitation: Although the list of variations (see Table 4, columns AD to DN) is long, not every variation or even every common variation is included.For example, we did not capture colour charges, which are quite common.

APC and APPC Details
Article processing charge as listed on the publisher's website, in the original currency.In the vast majority of cases, this data was gathered during the census period of 15-30 May, however, in some cases missing data were gathered in fall of 2015.
Where more than one currency is listed (this is common), we select what appears to be the primary currency, e.g., the first currency listed or the currency for local authors.Where pricing is available for different types of articles, the price for research articles is selected.Where discounted pricing is listed, the price before discounts is selected.Where a price is given for up to a certain number of pages, this is the price listed."0" in this column indicates that a publisher clearly uses APCs for this journal but that publishing is currently free.For example, Hindawi regularly offers free publishing for their journals on a rotating basis."No publication charge" means that we have confirmed that the journal does not have any fees associated with publication."No cost found" means that we could not confirm whether or not there is an APC."Title not found" means that we were not able to confirm whether the journal still exists or not.

BF to BP: PubNumber
This is a rough indication of the number of articles per journal for journals that provide article-level metadata to DOAJ.After completing the gathering of data from DOAJ, a study comparing these data with actual journal publishing data uncovered a problem with article metadata supplied to DOAJ, which makes these data highly unreliable.This is included in the full dataset on the premise that flawed data are better than no data.

Using these Data (Licensing)
This dataset is derived from several sources, including the DOAJ metadata (which has its own license terms posted on the DOAJ website), other data screen-scraped from DOAJ, factual data gathered from publisher's websites, 2010 data provided by Solomon and Björk, and our team's analysis.If you are making use of our dataset as a whole, please cite: Morrison, H.; Salhab, J.; Mondésir, G.; Calvé-Genest, A.; & Villamizar, C.; Desautels, L. Open access article processing charges longitudinal study 2015 preliminary dataset [http://dataverse.scholarsportal.info/dvn/dv/oaapc].If you are drawing from the other sources, please cite the other sources.There is no license for the dataset as a whole, as individual elements are derived from different sources, which may have their own terms.

3. 6 .
BQ to BT These are working columns for a small study comparing journals by small publishers with and without APCs as described on the Sustaining the Knowledge Commons blog: Publication Charge (1 = yes, 0 = no) DOAJ No Charges (Sampling) Publisher Size (DOAJ No Charges) DOAJ Confirmed charges (sampled).

3. 7 .
BU to BXSubject classification: This is an SKC grouping of subject classifications intended to roughly mirror the work of Solomon and Björk for comparison purposes.

3 .
10. CK-Preliminary Sample 2015-Y for all Titles for which We Had Data from 2010, 2013, or 2014, to Permit the Longitudinal Analysis 3.11.DP to DR Columns for entering APC data listed in DOAJ for journals, added after March 2014, that have charges.Data not yet entered.

3 .
12. DR to DU: For Recording Changes in DOAJ from 2014 to 2015 Relevant to Journals Sampled DR: In DOAJ 2015 not 2014-Y for titles added, based on publisher website information that were not included in the 2014 sample.DS: In DOAJ 2014 not 2015-Y for titles from our sample in 2014 that we could not find in the 2015 DOAJ metadata file.DT: Not in DOAJ 2014 or 2015-Y for titles drawn from publisher websites that were not in DOAJ either year.

Table 1 .
Columns A to Q. Always "Yes" in DOAJ as of 2014 and 2015.This is incorrect, as only some DOAJ journals provide article-level metadata, which this column is intended to indicate.
ColumnColumn Title and Notes/Deviations from DOAJ Metadata A Publisher (from DOAJ; occasional clean-up to facilitate gathering, e.g., typo correction) B Title (almost always directly from DOAJ; occasional small variations due to title name or 3.2.R to U: Publisher Size (Publisher APC Journal List) and Type (Commercial, Society, etc.)

Table 2 .
Columns R to U.

Table 3 .
Publisher Type-Codes.SKC Article Processing Charges/Article Page Processing Charges and Related Information Table 4, below, provides a column-by-column explanation of the APC and related information contributed by the research team in 2013, 2014, and 2015.

Table 4 .
Article Processing Charges/Article Page Processing Charges and related information.
"Based on contributions" is an assumption.Generally discounts refer to discounts for editors, reviewers, etc.