Reproducible Research Overview


Reproducible Research Overview

We have covered many of the practical aspects achieving reproducibility.

pillars
pillars

Reproducible Research Overview

We have covered many of the practical aspects achieving reproducibility.

pillars
pillars

Best Practise

In order of importance.

  1. Code should be in Quarto/Rmarkdown. This should encode everything from start to finish. 2=. R and R packages should be maintained with Renv (and archived) 2=. Python and Command Line Tools should be maintained with Conda (and archived)
  2. Git and GitHub will be used to track and share everything.
  3. Docker can be used to containerize system setups between users
  4. Final analyses should be uploaded to Zenodo or something similar

Towards a standard operating procedure

Though the best practices are exactly that the context matters.

It is okay if you do not hit every mark perfectly for every plot. As we discussed in the introduction the context matters.

We will run through some examples.

Case Studies

I am starting my PhD project which I will work on with several other lab members.

  • Check genome versions used by your lab. If there’s a fasta/gtf, do you have a record of how they were generated?
  • Does your lab use any pipelines/workflows? Find out versions of tools used in these workflows.
  • Start a new Renv and conda for the project. This way you have the newest version of tools. If possible match versions from workflows.
  • Create a GitHub. Share yml/lock files so everyone can be in same place
  • Optionally: Create a docker with everything included in one place.

Case Studies

I am starting my PhD project which I will work on by myself.

The same as working collaboratively. You will want a record of your packages and version control through GitHub for publication regardless (and for your own sanity).

Plus you may gain collaborators as you go and having this set up will make that process easier.

Case Studies

I am submitting a paper.

Did you keep a good record of your work? If yes, than this is easy. If no:

  • Double check you can actually regenerate your figures using your code.
  • Keep track of package versions etc while you reprocess.
  • Compile finalized code into GitHub then create a release on Zenodo.

Case Studies

I have built a ATACseq pipeline for the whole lab to use

  • Create a docker with your code, packages and everything included in one place.
  • Share the recipe with others in the lab so they match the pipeline versions of tools for downstream analysis
  • When ready, create a Zenodo so anyone who uses the pipeline for publication will have a DOI to include for the methods.

Our Example

“Quick one-line queries or plots of results (that wont be published) will not need to have a record of your system info, version numbers etc. They are not worth the time investment.”

If you already have Rmarkdown, Renv and Conda setup the inertia for reproducibility is minimal. So a quick plot will never be completely orphaned and unusable for publications.

Challenges remain

  • Reproducibility skills are not prioritized
  • Fixation on novel research direction in funding
  • Journals have no incentive to make sure any code provided actually good

Reproducibility is undervalued. This means it is up to the research community to hold ourselves accountable.