From 4b6d6357a08210e42082bdb2fbb3fed5cbeb8e7d Mon Sep 17 00:00:00 2001 From: GitHub Actions Date: Fri, 16 Aug 2024 08:21:59 +0000 Subject: [PATCH] site deploy Auto-generated via {sandpaper} Source : 444f6cad8850593c93975654db7dc9345b4ee782 Branch : md-outputs Author : GitHub Actions Time : 2024-08-16 08:21:40 +0000 Message : markdown source builds Auto-generated via {sandpaper} Source : 46dbcd2064d63f99f5d311c1d0a2d2ec9aa90cd4 Branch : main Author : Andy Turner Time : 2024-08-16 08:20:53 +0000 Message : Merge pull request #241 from jcohen02/fix/issue231 Updates to the reproducibility section to address #231 --- advanced-containers.html | 10 ++-- aio.html | 71 ++++++++++++++++------- creating-container-images.html | 6 +- docker-hub.html | 2 +- instructor/advanced-containers.html | 10 ++-- instructor/aio.html | 71 ++++++++++++++++------- instructor/creating-container-images.html | 6 +- instructor/docker-hub.html | 2 +- instructor/reproduciblity.html | 55 +++++++++++++----- md5sum.txt | 2 +- pkgdown.yml | 2 +- reproduciblity.html | 55 +++++++++++++----- 12 files changed, 200 insertions(+), 92 deletions(-) diff --git a/advanced-containers.html b/advanced-containers.html index e695bb09..80175941 100644 --- a/advanced-containers.html +++ b/advanced-containers.html @@ -439,7 +439,7 @@

Running containers

the Python Wiki and is set to add all numbers that are passed to it as arguments.

@@ -633,7 +633,7 @@

Exercise: Checking the options

Container Granularity


As mentioned above, one of the decisions you may need to make when @@ -548,7 +575,7 @@

Positives and negatives

Containers in Research Workflows: Reproducibility and Granularity

-

Last updated on 2024-08-01 | +

Last updated on 2024-08-16 | Edit this page

@@ -421,24 +421,33 @@

Work in progress…

By reproducibility here we mean the ability of someone else (or your future self) being able to reproduce what you did computationally at a particular time (be this in research, analysis or -something else) as closely as possible even if they do not have access +something else) as closely as possible, even if they do not have access to exactly the same hardware resources that you had when you did the original work.

+

What makes this especially important? With research being +increasingly digital in nature, more and more of our research outputs +are a result of the use of software and data processing or analysis. +With complex software stacks or groups of dependencies often being +required to run research software, we need approaches to ensure that we +can make it as easy as possible to recreate an environment in which a +given research process was undertaken. There many reasons why this +matters, one example being someone wanting to reproduce the results of a +publication in order to verify them and then build on that research.

Some examples of why containers are an attractive technology to help with reproducibility include:

-
  • The same computational work can be run across multiple different -technologies seamlessly (e.g. Windows, macOS, Linux).
  • +
    • The same computational work can be run seamlessly on different +operating systems (e.g. Windows, macOS, Linux).
    • You can save the exact process that you used for your computational work (rather than relying on potentially incomplete notes).
    • You can save the exact versions of software and their dependencies in the container image.
    • -
    • You can access legacy versions of software and underlying +
    • You can provide access to legacy versions of software and underlying dependencies which may not be generally available any more.
    • Depending on their size, you can also potentially store a copy of key data within the container image.
    • -
    • You can archive and share the container image as well as associating -a persistent identifier with a container image to allow other -researchers to reproduce and build on your work.
    • +
    • You can archive and share a container image as well as associating a +persistent identifier with it, to allow other researchers to reproduce +and build on your work.

Sharing images


As we have already seen, the Docker Hub provides a platform for @@ -446,8 +455,8 @@

Work in progress…BASH
  • When you publish work (in whatever way) use an archiving and DOI service such as Zenodo to make sure your container image is captured as -it was used for the work and that is obtains a persistent DOI to allow -it to be cited and referenced properly.
  • +it was used for the work and that it is assigned a persistent DOI to +allow it to be cited and referenced properly. +
  • Make use of tags when naming your container images, this ensures +that if you update the image in future, previous versions can be +retained within a container repository to be easily accessed, if this is +required.
  • +
  • A built and archived container image can ensure a persistently +bundled set of software and dependecies. However, a +Dockerfile provides a lightweight means of storing a +container definition that can be used to re-create a container image at +a later time. If you’re taking this approach, ensure that you specify +software package and dependency versions within your +Dockerfile rather than just specifying package names which +will generally install the most up-to-date version of a package. This +may be incompatible with other elements of your software stack. Also +note that storing only a Dockerfile presents +reproducibility challenges because required versions of packages may not +be available indefinitely, potentially meaning that you’re unable to +reproduce the required environment and, hence, the research +results.
  • Container Granularity


    As mentioned above, one of the decisions you may need to make when @@ -546,7 +573,7 @@

    Positives and negatives