Next Article in Journal
A Novel Three-Stage Filter-Wrapper Framework for miRNA Subset Selection in Cancer Classification
Next Article in Special Issue
Utilizing Provenance in Reusable Research Objects
Previous Article in Journal / Special Issue
LabelFlow Framework for Annotating Workflow Provenance
Article Menu

Export Article

Open AccessArticle
Informatics 2018, 5(1), 12; https://doi.org/10.3390/informatics5010012

Using Introspection to Collect Provenance in R

1
Computer Science Department, Mount Holyoke College, South Hadley, MA 01075, USA
2
Harvard Forest, Harvard University, Petersham, MA 01366, USA
3
Harvard College, Cambridge, MA 02138, USA
*
Author to whom correspondence should be addressed.
Current address: Google Inc., Mountain View, CA 94043, USA.
Received: 1 December 2017 / Revised: 25 February 2018 / Accepted: 26 February 2018 / Published: 1 March 2018
(This article belongs to the Special Issue Using Computational Provenance)
Full-Text   |   PDF [379 KB, uploaded 1 March 2018]   |  

Abstract

Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https://github.com/End-to-end-provenance/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R’s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility. View Full-Text
Keywords: scientific data provenance; provenance capture; provenance granularity; R; introspection scientific data provenance; provenance capture; provenance granularity; R; introspection
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Lerner, B.; Boose, E.; Perez, L. Using Introspection to Collect Provenance in R. Informatics 2018, 5, 12.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Informatics EISSN 2227-9709 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top