Transparent Programming: Important Habits for Reproducibility and Research Integrity

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. - J. Buckheit and D. Donoho

Welcome

This course, “Transparent Programming: Important Habits for Reproducibility and Research Integrity,” is a hands-on exploration of robust and ethical programming practices. It focuses on cultivating proficient use of R and Git/Github for collaborative, open-source environments. Participants will delve into the ethos of reproducible research, comprehend the criticality of transparency in coding, and understand how these practices interweave with scientific integrity. A cornerstone of the course is the creation of a reproducible simulation, offering an applied perspective on version control, debugging, and transparent coding.

By the end of this workshop you should be able to publish a research project with all of the components of our Computational Reproducibility Checklist

  • Data Source and Accessibility
    • Description of data source.
    • Explanation for data non-disclosure (if applicable).
    • Provision of “blurred” or anonymized data (if possible).
  • Data Dictionary
    • Detailed specification of all raw data variables.
    • Formulas for secondary variables.
    • Mapping between paper’s variable names and code’s variable names.
  • Analysis Process Documentation
    • Step-by-step guide from raw data to final results.
    • Details on data preparation, exclusions, and transformations.
    • Description of pre-processing steps (e.g., normalization, imputations).
  • Code and Software Dependencies
    • E-supplement of all code segments.
    • Code documentation for clarity.
    • Log files from the execution of the code.
    • List of all software and libraries with exact version numbers.
  • Synthetic Dataset
    • Mock dataset with the structure of the original but artificial values.
  • Dependency Management
    • requirements.txt (Python) or Renv.lock (R) for library dependencies.
    • Dockerfile or similar for replicating the computing environment.
  • Reproducibility Guide
    • Document guiding reproduction of results.
    • Steps for setting up the environment, data loading, code execution, and output interpretation.
  • Results and Outputs
    • Explicit statement of output locations post code execution.
    • Documentation of intermediate outputs or checkpoints.
  • Open Science Practices
    • Use of open-source tools and platforms.
    • Link to a version-controlled repository (e.g., GitHub) for active code maintenance.

Our workshop material is licensed under a Creative Commons Attribution 4.0 International License.

Acknowledgements

This workshop material emerges from the collaborative efforts of two dedicated individuals deeply passionate about computational reproducibility and fostering research integrity.

Key Coordinators: