Abstract
This article documents Open access article processing charges (OA APC) longitudinal study 2015 preliminary dataset available for download from the OA APC dataverse [1]. This dataset was gathered as part of Sustaining the Knowledge Commons (SKC), a research program funded by Canada’s Social Sciences and Humanities Research Council. The overall goal of SKC is to advance our collective knowledge about how to transition scholarly publishing from a system dependent on subscriptions and purchase to one that is fully open access. The OA APC preliminary data 2015 Version 12 dataset was developed as one of the lines of research of SKC, a longitudinal study of the minority (about a third) of the fully open access journals that use this business model. The original idea was to gather data during an annual two-week census period. The volume of data and growth in this area makes this an impractical goal. For this reason, we are posting this preliminary dataset in case it might be helpful to others working in this area. Future data gathering and analyses will be conducted on an ongoing basis. We encourage others to share their data as well. In order to merge datasets, note that the two most critical elements for matching data and merging datasets are the journal title and ISSN.
Data Set License: There is no license for the dataset as a whole, as individual elements are derived from different sources, which may have their own terms.
1. Summary
This dataset includes information on open access journals derived from the Directory of Open Access Journals (DOAJ), developed as the base for a longitudinal study on the open access article processing charges (APC) method used by about a third of open access journals. In the APC business model, a payment is made, by an author, institution, or funding agency, for publishing an article so that the article can be freely available to everyone (open access). In addition to DOAJ metadata, this dataset includes 2010 APC data provided by Solomon and Björk [2], a smaller set of pilot project data collected by the research team in 2013, and a fuller set of data collected on APCs by the research team in 2014 and 2015, as well as additional data relating to APC sub-model (e.g., variations in pricing, page versus article charges), analysis of publisher type, problematic (but possible useful) article-level metadata screen scraped from DOAJ, and a custom subject analysis. To date, these data were used as the basis for a 2014 DOAJ APC survey [3]. This project received funding from Canada’s Social Sciences and Humanities Research Council under the Insight Development Grant program for 2014–2016. In 2016, the dataset will be updated and expanded to include publishers missed in 2015, at which point data analysis and preparation for a new survey article is anticipated. At present, there is keen interest from research funders, libraries, scholars, and publishers on the economics of transition to open access. This dataset will facilitate and speed up the work of other researchers, and this document describing the data is necessary to understand and analyse the data.
2. Data Description, Method and Limitations
2.1. Major Sources for this Dataset
Major sources of data for this dataset include:
- the Directory of Open Access Journals (DOAJ) downloadable metadata; the base set is from May 2014, with some additional data from the 2015 dataset
- data on publisher article processing charges and related information gathered from publisher websites by the SKC team in 2015, 2014 (Morrison, Salhab, Calvé-Genest and Horava, 2015), and a 2013 pilot
- DOAJ article content data screen scraped from DOAJ (caution; these data can be quite misleading due to limitations with article-level metadata)
- Subject analysis based on DOAJ subject metadata in 2014 for selected journals
- Data on APCs gathered in 2010 by Solomon and Björk [2] (supplied by the authors). Note that Solomon and Björk use a different method of calculating APCs, so the numbers are not directly comparable; please refer to Solomon and Björk [2] for details on their methods.
- Note that this full dataset includes some working columns, which are meaningful only by means of explaining very specific calculations, which are not necessarily evident in the dataset per se. Details below.
2.2. Significant Limitation
- This dataset does not include new journals added to the DOAJ in 2015. A recent publisher size analysis indicates some significant changes. For example, DeGruyter, not listed in the 2014 survey, is now the third largest DOAJ publisher with over 200 titles. Elsevier is now the 7th largest DOAJ publisher. In both cases, gathering data from the publisher websites will be time-consuming as it is necessary in order to conduct individual title look-up.
- Some OA APC data for newly added journals was gathered in May 2015 but has not yet been added to this dataset. One of the reasons for gathering these data is a comparison of the DOAJ “one price listed” approach with potentially richer data on the publisher’s own website.
3. Explanation of Specific Data
3.1. A to Q: DOAJ Metadata
Columns A to Q are DOAJ metadata, with the exception of titles added in 2015 that are on the publisher’s website but not in DOAJ. Most of the DOAJ metadata are from 2014 (at the time of the first annual survey). Some 2015 data were added. See, for example, column DR “in DOAJ 2015 not 2014”—any DOAJ metadata for these titles were taken from the DOAJ 2015 dataset. Note also that our 2014 DOAJ file did not include keywords; any keyword data are from 2015. Titles that were taken from the publisher’s website that were not in DOAJ can be identified using the column DT “not in DOAJ 2014 or 2015”. Titles that only have information in columns A and B (publisher and title) are another indication that the title was on the publisher’s website but not in DOAJ (Table 1).
Table 1.
Columns A to Q.
3.2. R to U: Publisher Size (Publisher APC Journal List) and Type (Commercial, Society, etc.)
Important limitations in regards to publisher type: In 2015, we conducted more in-depth research on publisher type than in 2014 for larger publishers. For this reason, there are more mixed publisher types. It is possible that mixed types are under-represented due to limitations in our analysis. That is, for larger commercial publishers, we assume all journals are commercial, but in some cases it takes in-depth reading about each journal to accurately identify whether a partnership is involved (Table 2 and Table 3).
Table 2.
Columns R to U.
Table 3.
Publisher Type—Codes.
3.3. SKC Article Processing Charges/Article Page Processing Charges and Related Information
Table 4, below, provides a column-by-column explanation of the APC and related information contributed by the research team in 2013, 2014, and 2015.
- V to AX (2014)
- AY to BD (2013)
- CL to DO (2015)
Table 4.
Article Processing Charges/Article Page Processing Charges and related information.
Important limitation: Although the list of variations (see Table 4, columns AD to DN) is long, not every variation or even every common variation is included. For example, we did not capture colour charges, which are quite common.
3.4. APC and APPC Details
Article processing charge as listed on the publisher’s website, in the original currency. In the vast majority of cases, this data was gathered during the census period of 15–30 May, however, in some cases missing data were gathered in fall of 2015.
Where more than one currency is listed (this is common), we select what appears to be the primary currency, e.g., the first currency listed or the currency for local authors. Where pricing is available for different types of articles, the price for research articles is selected. Where discounted pricing is listed, the price before discounts is selected. Where a price is given for up to a certain number of pages, this is the price listed. “0” in this column indicates that a publisher clearly uses APCs for this journal but that publishing is currently free. For example, Hindawi regularly offers free publishing for their journals on a rotating basis. “No publication charge” means that we have confirmed that the journal does not have any fees associated with publication. “No cost found” means that we could not confirm whether or not there is an APC. “Title not found” means that we were not able to confirm whether the journal still exists or not.
3.5. BF to BP: PubNumber
This is a rough indication of the number of articles per journal for journals that provide article-level metadata to DOAJ. After completing the gathering of data from DOAJ, a study comparing these data with actual journal publishing data uncovered a problem with article metadata supplied to DOAJ, which makes these data highly unreliable. This is included in the full dataset on the premise that flawed data are better than no data.
3.6. BQ to BT
These are working columns for a small study comparing journals by small publishers with and without APCs as described on the Sustaining the Knowledge Commons blog:
- Publication Charge (1 = yes, 0 = no)
- DOAJ No Charges (Sampling)
- Publisher Size (DOAJ No Charges)
- DOAJ Confirmed charges (sampled).
3.7. BU to BX
Subject classification: This is an SKC grouping of subject classifications intended to roughly mirror the work of Solomon and Björk for comparison purposes.
3.8. BY to CG
Data supplied by Solomon and Björk.
3.9. CH to CJ
Working columns.
3.10. CK—Preliminary Sample 2015—Y for all Titles for which We Had Data from 2010, 2013, or 2014, to Permit the Longitudinal Analysis
3.11. DP to DR
Columns for entering APC data listed in DOAJ for journals, added after March 2014, that have charges. Data not yet entered.
3.12. DR to DU: For Recording Changes in DOAJ from 2014 to 2015 Relevant to Journals Sampled
DR: In DOAJ 2015 not 2014—Y for titles added, based on publisher website information that were not included in the 2014 sample.
DS: In DOAJ 2014 not 2015—Y for titles from our sample in 2014 that we could not find in the 2015 DOAJ metadata file.
DT: Not in DOAJ 2014 or 2015—Y for titles drawn from publisher websites that were not in DOAJ either year.
4. Using these Data (Licensing)
This dataset is derived from several sources, including the DOAJ metadata (which has its own license terms posted on the DOAJ website), other data screen-scraped from DOAJ, factual data gathered from publisher’s websites, 2010 data provided by Solomon and Björk, and our team’s analysis. If you are making use of our dataset as a whole, please cite: Morrison, H.; Salhab, J.; Mondésir, G.; Calvé-Genest, A.; & Villamizar, C.; Desautels, L. Open access article processing charges longitudinal study 2015 preliminary dataset [http://dataverse.scholarsportal.info/dvn/dv/oaapc ]. If you are drawing from the other sources, please cite the other sources. There is no license for the dataset as a whole, as individual elements are derived from different sources, which may have their own terms.
Acknowledgments
The authors gratefully acknowledge funding provided by Canada’s Social Sciences and Humanities Research Council under an Insight Development Grant for the Sustaining the Knowledge Commons research program of which this project forms a part.
Author Contributions
Heather Morrison: Principal Investigator, project design, supervision, primary drafter and data gatherer. Data gathering and analysis: Alexis Calvé-Genest, Lisa Desautels, Guinsly Mondésir, Heather Morrison, Jihane Salhab, and César Villamizar. César Villamizar conducted analysis using analytic and data visualization software and standard office applications, developed the research data management plan, including the identification and classification of electronic files, naming conventions and file version control.
Conflicts of Interest
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
References
- Morrison, H.; Salhab, J.; Mondésir, G.; Calvé-Genest, A.; Villamizar, C.; Desautels, L. Open access article processing charges longitudinal study 2015 preliminary dataset. Available online: http://dataverse.scholarsportal.info/dvn/dv/oaapc/ (accessed on 22 March 2016).
- Solomon, D.J.; Björk, B.C. A study of open access journals using article processing charges. J. Am. Soc. Inf. Sci. Technol. 2012, 63, 1485–1495. [Google Scholar] [CrossRef]
- Morrison, H.; Salhab, J.; Calvé-Genest, A.; Horava, T. Open Access Article Processing Charges: DOAJ Survey May 2014. Publications 2015, 3, 1–16. [Google Scholar] [CrossRef]
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution-NonCommercial-NoDerivatives (CC-BY-NC-ND) license (https://creativecommons.org/licenses/by-nc-nd/4.0/).