Welcome!

Transparent Programming: Important Habits for Reproducibility and Research Integrity

Thanks!

  • To you for being here!

  • To the Society for Medical Decision Making for supporting the development of this workshop

  • To Madison Coots for helping develop this workshop!

Who are you? What brings you here?

  • E.g. your name, pronouns, background, and motivation for this workshop.

General guidelines (and advice!)

  • Ask for help when you need it!
    • And in the future you can ask in the Slack chat.

Please open the workshop site

You can find all workshop information at https://code4mdm.github.io.

Being reproducible

Is this enough?

  • Access to the code
  • Access to the data
  • (And let’s assume we can replicate the environment)

How confident do you feel?

This code is a kludgly, ugly, inefficient mess. (…) It is probably riddled with problems, mistakes, bugs, inefficiencies, vestigial code stubs, etc etc. (…) I am as confident as I am capable of being that all of the factual claims that were made in the manuscript are accurate.

We need to do more: we need to inspire trust. - The code is correct (and I have made it easy for you/someone to check); - My workflow is robust; - My workflow itself is accessible, and I will be guiding you through it.

The Four Facets of Reproducibility

Documentation

What do you need to execute this project? Where do you start?

Organization

Demonstrate a trustworthy workflow.

Automation

Automated analyses trace your steps, and prevent human error (or at the very least: document it).

Dissemination

Share your data, release your code, publish your findings.

What will we do in this short course?

We will take you through a workflow (in a broad sense!)

  • Setting up a project

  • Establishing a robust backup / version control system

  • Writing documentation

  • Making your project accessible

What do we expect from you?

  • Our group has many different abilities and experiences. We hope you will value this as much as we do!

  • Be prepared to work with others around you.

  • Ask questions when you have them!

  • Feel free and safe to share your expertise and experiences.

Our objectives for you

We want to teach you good habits that will make your work more accessible, trustworthy, and reproducible by others. In doing so, we have tried to identify those habits that are a good return on investment: meaning, they save you time in the not-so-long run.

And finally: we hope you enjoy the short course

Introdction to Git and GitHub :D

Research compendium

A research compendium is a collection of all digital parts of a research project including data, code, texts (…). The collection is created in such a way that reproducing all results is straightforward.

Source: The Turing Way

Getting started

  • Contain your project in a single recognizable folder

  • Distinguish folder types, name them accordingly:

    • Read-only: data, metadata
    • Human-generated: code, paper, documentation
    • Project-generated: clean data, figures, models…
  • Initialize a README file, document your project

  • Choose a license

  • Publish your project.

Getting started

.
├── README.md
├── code
├── data
├── documents
│   ├── documentation
│   └── literature
└── output
    ├── figures
    └── tables

Cookiecutter

You can set up a project template using a nifty tool called cookiecutter.

First, ensure you have cookiecutter installed:

pip install cookiecutter

or with conda:

conda install cookiecutter

Cookiecutter

or try the following alternatives:

  • MacOS X with Homebrew:
brew install cookiecutter
  • Debian/Ubuntu:
sudo apt-get install cookiecutter

Cookiecutter

cookiecutter

A command-line utility that creates projects from cookiecutters (project templates). e.g. creating a Python package project from a Python package project template.

cookiecutter.readthedocs.io

There are MANY templates available for your purposes. Take a look!

Cookiecutter

We have chosed a designed a template based on my own personal preferences. Go ahead and install it!

cookiecutter gh:jacobjameson/cookiecutter-project

Answer the questions cookiecutter asks you, and browse the resulting project to see where your answers ended up.

A note on paths

  • Your project should be transportable between computers.

  • For this reason, you should use relative paths only: compare

    • /Users/jacob/Dropbox/data/zincfinger.json
    • ./data/zincfinger.json
  • ./ means: in this folder

  • ../ means: one folder up

Choosing a license

  • Copyright is implicit; others cannot use your code without your permission.

  • Licensing gives that permission, and its boundaries and conditions.

  • Choosing a license early on means being aware of your license as the project proceeds (and not creating conflicts).

  • There are over 80 OSI-approved licenses (and many, many others) to choose from.

Choosing a license

This is one I like to use:

license

What is important to you? What does your lab use? Choose your own license!

Publishing your project

Uh… Isn’t ‘publication’ the thing you do… at the end?

No! Publishing your project at an early stage - forces you to consider readability throughout - minimizes the mess you have to deal with when you (finally) decide to publish - allows collaboration and support - facilitates sharing and re-use.

But what if someone scoops my code! I’m a revolutionary, they will steal my ideas!

If you are super paranoid, you can always opt for a private repository. It is your work & up to you. But consider the advantages!

Publishing unpublished data

  • If you have sensitive data…

  • Don’t include your data in your software repository (that’s not what they are for anyway).

  • Consider generating simulated data so your code can run regardless.

  • And for all data:

    • Your data should be separate from your code!
    • If your code references your data, consider a config or metadata file for these references.

Where do I publish?

Living project: github (or other social coding platform):

  • synergistic with version control software git

  • makes history public and accessible (eek!)

  • allows publication of different releases

  • provides a platform for interaction and collaboration

Archiving a release: zenodo

(or other stable repository, like the OSF) - direct archiving supported from github to zenodo

  • this gives you a doi (digital object identifier): your code is citeable!