-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CI to build mulled containers with Wave #4080
base: master
Are you sure you want to change the base?
Conversation
Why don't we do that already? |
Just hadn't gotten around to it 😆 I'm also having trouble authenticating with the registry. I didn't think it would have to go through tower if it was public. @pditommaso could you confirm? I think if not we can use Tower to build the images in CI using |
How does that impact offline use of pipelines? A mulled container can easily be downloaded with |
- bioconda::bowtie=1.3.0 | ||
- bioconda::samtools=1.16.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bioconda::
probably not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky point, to include package compatible with ARM64, we are maintaining a seqera
channel forked from bioconda. Therefore it be useful to add seqera
in the channels and remove the bioconda::
prefix.
However, once bioconda will support ARM64, the goal is to merge those recipes there.
That was @ewels concern as well! I think But I think I misunderstood that the wave builds are happening locally, but I believe they're happening on a web server. Ideally this would just be for building the containers, and then end users will be able to pull the container normally, unless they modify the environment. |
I believe this is a supported feature of Wave Freeze and Build time is happening at the Wave server, but this should be functionally equivalent to running nf-core download. @MatthiasZepper, would you agree? |
I am not in Barcelona and probably somewhat out of the loop. Additionally, I have no experience using Wave, @mribeirodantas Bytesize talk is essentially all I know about it. In general, I have no objections against abandoning Bioconda's CI in favour of Wave. Coincidentally, I learned yesterday why Oxford Nanopore has little confidence in Bioconda and that they are strongly favouring own distribution channels. Moving away from Bioconda allows ONT to support AArch64 natively and for a better provenance in dependencies. Taking into account the importance of ONT software in nf-core pipelines, I think that alone justifies allowing for other software sources and container registries in modules, including building them with Wave.
The ability to run offline is indeed paramount for us at NGI. We would need to maintain our own forks of the pipelines with patched modules, if that option was dropped from the nf-core pipelines/modules. From which registry the container is downloaded/pulled is pretty much irrelevant for us (a non-free/non-sponsored service might be a slight downer, though), but might matter for scientists in other countries that are under international sanctions (e.g. Iran, Russia etc.?)
Tools 2.10 is simply extracting the container definitions from the modules & configs with Regexes. Subsequently, it is downloading all that start with But the decision of abandoning the regexes in favour of My only concern is that the pipeline, when executed, will not find the appropriate container without contacting the Wave server asking for its hash respectively the required layers. But I believe this concern is actionable by the Nextflow/Wave plugin developers? |
tests/config/nextflow.config
Outdated
wave.enabled = true | ||
wave.strategy = ['dockerfile', 'conda', 'container'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wave.enabled = true | |
wave.strategy = ['dockerfile', 'conda', 'container'] | |
wave.enabled = true | |
wave.strategy = ['conda', 'container'] | |
wave.freeze = true | |
wave.build.repository = 'docker.io/emiller88/modules' |
Singularity native requires the freeze mode, which in turn requires also the target build repository
tests/config/nextflow.config
Outdated
@@ -28,6 +30,9 @@ if ("$PROFILE" == "singularity") { | |||
docker.userEmulation = false | |||
docker.fixOwnership = true | |||
docker.runOptions = '--platform=linux/amd64 -u $(id -u):$(id -g)' | |||
wave.enabled = true | |||
wave.strategy = ['dockerfile', 'conda', 'container'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wave.strategy = ['dockerfile', 'conda', 'container'] | |
wave.strategy = 'conda' |
Likely you want only rely on conda
packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a few modules with dockerfiles (bclconvert off the top of my head) and a few that use images only and don't have conda packages (parabricks I think is this way)
tests/config/nextflow.config
Outdated
@@ -28,6 +30,9 @@ if ("$PROFILE" == "singularity") { | |||
docker.userEmulation = false | |||
docker.fixOwnership = true | |||
docker.runOptions = '--platform=linux/amd64 -u $(id -u):$(id -g)' | |||
wave.enabled = true | |||
wave.strategy = ['dockerfile', 'conda', 'container'] | |||
wave.build.repository = 'docker.io/emiller88/modules' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not strictly necessary. if omitted it used Wave default registry (tho images will only be cached for one week)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd want to not rebuild them unless the environment changes.
The other issue is it would make more sense to have one container repo per module.
Currently it looks like wave creates a new tag with the env name rather than a image per env.
Currently, without credentials you will be limited to 25 builds per day. We are discussing how to allow a community registry for nf-core pipelines. |
Would a combination of Wave with Kraken be an option? There are enough academic institutions represented in the nf-core community that could sponsor a server/mirror or two, so we could host the images in a peer-to-peer network after they were build with Wave? Unfortunately, this does seemingly not support Singularity images... |
modules/nf-core/bowtie/align/main.nf
Outdated
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
'https://depot.galaxyproject.org/singularity/mulled-v2-ffbf83a6b0ab6ec567a336cf349b80637135bca3:c84c7c55c45af231883d9ff4fe706ac44c479c36-0' : | ||
'biocontainers/mulled-v2-ffbf83a6b0ab6ec567a336cf349b80637135bca3:c84c7c55c45af231883d9ff4fe706ac44c479c36-0' }" | ||
'oras://community.wave.seqera.io/library/bowtie_samtools:16f00b34cfc72399' : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will break nf-core download
. I'd prefer the https links if possible, and should test the download if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a Seqera feedback link for the https links?
@@ -3,9 +3,10 @@ process BOWTIE_ALIGN { | |||
label 'process_high' | |||
|
|||
conda "${moduleDir}/environment.yml" | |||
container "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add in ARM builds as well (may need to tolerate + log failures).
Also, is it possible to parallelise the different requests in separate jobs? Would go 4x faster then.
- name: Create a registry name | ||
uses: actions/github-script@v7 | ||
id: registry-name | ||
with: | ||
result-encoding: string | ||
script: | | ||
return '${{ matrix.files }}'.replace('modules/nf-core/', '').replace('/environment.yml', '').replace('/', '_'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to have name
in the conda environment, right? We can remove if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this was to have a different wave repo per module. We can keep this for anything built by a Dockerfile for now(bclconvert, cellranger, etc.)
It'd be nice to also write the container URIs to the Note that we haven't decided on a spec for how this should look in the meta YAML file yet, so need to do that. Should be easy to agree though, I have a rough idea in my head already. |
Sounds good! We can log it in the CI definitely, or maybe we output it to files like bioconda does.
Already is! It just skips the step depending on the matrix. |
Closes #4077