We investigated the effects of data processing variables such as FreeSurfer version (v4.3.1, v4.5.0, and v5.0.0), workstation (Macintosh and Hewlett-Packard), and Macintosh operating system version (OSX 10.5 and OSX 10.6). Significant differences were revealed betweenFreeSurfer version v5.0.0 and the two earlier versions. [...] About a factor two smaller differences were detected betweenMacintosh and Hewlett-Packard workstationsand betweenOSX 10.5 and OSX 10.6.
ReproZip tries to solve...
Workload & Time Challenges It is a time commitment to get data and code ready to share, and to share it
Otherwise known as...
the Incentive Problem Reproducibility takes time, and is not always valued by the academic reward structure
"Insufficient time is the main reason why scientists do not make their data and experiment available and reproducible." Carol Tenopir, Beyond the PDF2 Conference
"77% claim that they do not have time to document and clean up the code." Victoria Stodden, Survey of the Machine Learning Community – NIPS 2010
ReproZip tries to solve...
Technical Obsolescence Technology changes affect the reproducibility
Normative Dissonance1 Espoused values don’t always match practice
Otherwise known as...
the Pipeline Problem Reproducibility requires skills that are not included in most curriculums!
"It would require huge amount of effort to make our code work with the latest versions of these tools." Collberg et al., Repeatability and Benefaction in Computer Systems Research, University of Arizona TR 14-04
ReproZip is a tool developed in Python (the tracing part in C) aimed at simplifying the process of creating reproducible... whatever. It can be research, it can be applications, it can be databases, it can be websites... if you can do it on a computer, chances are we can pack it!
2 Steps to Reproducibility
Step 1: Trace & Pack
reprozip trace [command]
reprozip pack package-name.rpz
Before you pack, you can edit the config.yml (optional, advanced usage of the tool).
# Inputs are files that are only read by a run; reprounzip can replace these
# files on demand to run the experiment with custom data.
# Outputs are files that are generated by a run; reprounzip can extract these
# files from the experiment on demand, for the user to examine.
# The name field is the identifier the user will use to access these files.
inputs_outputs:
- name: bechdel.json
path: /home/vagrant/bechdel/bechdel.json
written_by_runs: [0]
read_by_runs: [1]
- name: revenue.csv
path: /home/vagrant/bechdel/revenue.csv
written_by_runs: [0]
read_by_runs: [1]
- name: revenue.png
path: /home/vagrant/bechdel/revenue.png
written_by_runs: [1]
read_by_runs: []
- ...
# These files come from packages; we can thus choose not to include them, as it
# will simply be possible to install that package on the destination system
# They are included anyway by default
packages:
- name: "dash"
version: "0.5.7-4+b1"
size: 195584
packfiles: true
files:
# Total files used: 122.46 KB
# Installed package size: 191.00 KB
- "/bin/dash" # 122.46 KB
- "/bin/sh" # Link to /bin/dash
- name: "libblas3"
version: "1.2.20110419-10"
size: 569344
packfiles: true
files:
# Total files used: 511.23 KB
# Installed package size: 556.00 KB
- "/usr/lib/libblas/libblas.so.3" # Link to /usr/lib/libblas/libblas.so.3.0
- "/usr/lib/libblas/libblas.so.3.0" # 511.23 KB
- name: "libc-bin"
version: "2.19-18+deb8u2"
size: 3341312
packfiles: true
files:
# Total files used: 870.03 KB
# Installed package size: 3.19 MB
- "/etc/gai.conf" # 2.52 KB
- "/sbin/ldconfig" # 387.0 bytes
- "/sbin/ldconfig.real" # 867.13 KB
- ...
# These files do not appear to come with an installed package -- you probably
# want them packed
other_files:
- "/etc/hosts"
- "/etc/resolv.conf"
- "/home/vagrant/.cache/matplotlib/fontList.cache"
- "/home/vagrant/.cache/matplotlib/tex.cache"
- "/home/vagrant/.config/matplotlib"
- "/home/vagrant/bechdel/bechdel.json"
- "/home/vagrant/bechdel/bechdel.py"
- "/home/vagrant/bechdel/cpi.csv"
- "/home/vagrant/bechdel/fetch.py"
- "/home/vagrant/bechdel/imdb_data.json"
- "/home/vagrant/bechdel/revenue.csv"
- "/home/vagrant/bechdel/venv/bin/python"
- "/home/vagrant/bechdel/venv/lib/python-wheels/chardet-2.3.0-py2.py3-none-any.whl"
- "/home/vagrant/bechdel/venv/lib/python-wheels/pip-1.5.6-py2.py3-none-any.whl"
- "/home/vagrant/bechdel/venv/lib/python-wheels/requests-2.4.3-py2.py3-none-any.whl"
- "/home/vagrant/bechdel/venv/lib/python-wheels/setuptools-5.5.1-py2.py3-none-any.whl"
- "/home/vagrant/bechdel/venv/lib/python-wheels/urllib3-1.9.1-py2.py3-none-any.whl"
- "/home/vagrant/bechdel/venv/lib/python2.7/abc.py"
- "/home/vagrant/bechdel/venv/lib/python2.7/codecs.py"
- "/home/vagrant/bechdel/venv/lib/python2.7/copy_reg.py"
- "/home/vagrant/bechdel/venv/lib/python2.7/distutils/__init__.py"
- "/home/vagrant/bechdel/venv/lib/python2.7/distutils/distutils.cfg"
- "/home/vagrant/bechdel/venv/lib/python2.7/encodings/__init__.py"
- "/home/vagrant/bechdel/venv/lib/python2.7/encodings/ascii.py"
- "/home/vagrant/bechdel/venv/lib/python2.7/encodings/base64_codec.py"
- "/home/vagrant/bechdel/venv/lib/python2.7/encodings/charmap.py"
- "/home/vagrant/bechdel/venv/lib/python2.7/fnmatch.py"
- ...
Tracing
Packing
Step 2: Set up & Run
Double click on the RPZ file, and choose your unpacker!
Setting Up
Running
Not just simple reproduction...
When you unpack your .rpz package with the GUI, you'll see:
The Gordon and Betty Moore Foundation & the Alfred P. Sloan Foundation, who support The Moore-Sloan Data Science Environment at NYU, which was vital to the development of ReproZip.