Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI to build mulled containers with Wave #4080

Open
wants to merge 84 commits into
base: master
Choose a base branch
from
Open

Add CI to build mulled containers with Wave #4080

wants to merge 84 commits into from

Conversation

edmundmiller
Copy link
Contributor

@edmundmiller edmundmiller commented Oct 17, 2023

Closes #4077

@edmundmiller edmundmiller changed the title Build containers with wave Replace mulled containers with wave Oct 17, 2023
@maxulysse
Copy link
Member

Why don't we do that already?

@edmundmiller
Copy link
Contributor Author

Why don't we do that already?

Just hadn't gotten around to it 😆

I'm also having trouble authenticating with the registry. I didn't think it would have to go through tower if it was public. @pditommaso could you confirm?

I think if not we can use Tower to build the images in CI using nextflow inspect if the Dockerfile or environment.yml changes and push the image for people.

@edmundmiller edmundmiller marked this pull request as ready for review October 18, 2023 08:24
@edmundmiller edmundmiller requested a review from a team as a code owner October 18, 2023 08:24
@MatthiasZepper
Copy link
Member

MatthiasZepper commented Oct 18, 2023

How does that impact offline use of pipelines? A mulled container can easily be downloaded with nf-core download. How about those built with Wave? Will this still work?

- bioconda::bowtie=1.3.0
- bioconda::samtools=1.16.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bioconda:: probably not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky point, to include package compatible with ARM64, we are maintaining a seqera channel forked from bioconda. Therefore it be useful to add seqera in the channels and remove the bioconda:: prefix.

However, once bioconda will support ARM64, the goal is to merge those recipes there.

@edmundmiller
Copy link
Contributor Author

How does that impact offline use of pipelines? A mulled container can easily be downloaded with nf-core download. How about those built with Wave? Will this still work?

That was @ewels concern as well! I think nextflow inspect might be able to handle this in theory.

But I think I misunderstood that the wave builds are happening locally, but I believe they're happening on a web server.

Ideally this would just be for building the containers, and then end users will be able to pull the container normally, unless they modify the environment.

@adamrtalbot
Copy link
Contributor

adamrtalbot commented Oct 18, 2023

I believe this is a supported feature of Wave Freeze and nextflow inspect -concretize, however I think if it can't run offline this feature is a no-go, so we should triple check.

Build time is happening at the Wave server, but this should be functionally equivalent to running nf-core download. @MatthiasZepper, would you agree?

@MatthiasZepper
Copy link
Member

MatthiasZepper commented Oct 18, 2023

I am not in Barcelona and probably somewhat out of the loop. Additionally, I have no experience using Wave, @mribeirodantas Bytesize talk is essentially all I know about it.

In general, I have no objections against abandoning Bioconda's CI in favour of Wave. Coincidentally, I learned yesterday why Oxford Nanopore has little confidence in Bioconda and that they are strongly favouring own distribution channels. Moving away from Bioconda allows ONT to support AArch64 natively and for a better provenance in dependencies. Taking into account the importance of ONT software in nf-core pipelines, I think that alone justifies allowing for other software sources and container registries in modules, including building them with Wave.

I believe this is a supported feature of Wave Freeze and nextflow inspect -concretize, however I think if it can't run offline this feature is a no-go, so we should triple check.

The ability to run offline is indeed paramount for us at NGI. We would need to maintain our own forks of the pipelines with patched modules, if that option was dropped from the nf-core pipelines/modules. From which registry the container is downloaded/pulled is pretty much irrelevant for us (a non-free/non-sponsored service might be a slight downer, though), but might matter for scientists in other countries that are under international sanctions (e.g. Iran, Russia etc.?)

I believe this is a supported feature of Wave Freeze and nextflow inspect -concretize. Build time is happening at the Wave server, but this should be functionally equivalent to running nf-core download.

Tools 2.10 is simply extracting the container definitions from the modules & configs with Regexes. Subsequently, it is downloading all that start with http:// paths and pulling the remainder. I am not aware of any code that would support Wave builds as of now.

But the decision of abandoning the regexes in favour of nextflow inspect for future releases is set in stone and I presume you are actually implementing that already @adamrtalbot & Marcel? In that case, nf-core download could probably also be enabled to resolve any Wave instructions and also kick off builds and obtain images from a dynamically created container URI if needed?

My only concern is that the pipeline, when executed, will not find the appropriate container without contacting the Wave server asking for its hash respectively the required layers. But I believe this concern is actionable by the Nextflow/Wave plugin developers?

Comment on lines 17 to 18
wave.enabled = true
wave.strategy = ['dockerfile', 'conda', 'container']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wave.enabled = true
wave.strategy = ['dockerfile', 'conda', 'container']
wave.enabled = true
wave.strategy = ['conda', 'container']
wave.freeze = true
wave.build.repository = 'docker.io/emiller88/modules'

Singularity native requires the freeze mode, which in turn requires also the target build repository

@@ -28,6 +30,9 @@ if ("$PROFILE" == "singularity") {
docker.userEmulation = false
docker.fixOwnership = true
docker.runOptions = '--platform=linux/amd64 -u $(id -u):$(id -g)'
wave.enabled = true
wave.strategy = ['dockerfile', 'conda', 'container']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wave.strategy = ['dockerfile', 'conda', 'container']
wave.strategy = 'conda'

Likely you want only rely on conda packages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a few modules with dockerfiles (bclconvert off the top of my head) and a few that use images only and don't have conda packages (parabricks I think is this way)

@@ -28,6 +30,9 @@ if ("$PROFILE" == "singularity") {
docker.userEmulation = false
docker.fixOwnership = true
docker.runOptions = '--platform=linux/amd64 -u $(id -u):$(id -g)'
wave.enabled = true
wave.strategy = ['dockerfile', 'conda', 'container']
wave.build.repository = 'docker.io/emiller88/modules'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly necessary. if omitted it used Wave default registry (tho images will only be cached for one week)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd want to not rebuild them unless the environment changes.

The other issue is it would make more sense to have one container repo per module.

Currently it looks like wave creates a new tag with the env name rather than a image per env.

@pditommaso
Copy link
Contributor

I'm also having trouble authenticating with the registry. I didn't think it would have to go through tower if it was public. @pditommaso could you confirm?

Currently, without credentials you will be limited to 25 builds per day. We are discussing how to allow a community registry for nf-core pipelines.

@MatthiasZepper
Copy link
Member

MatthiasZepper commented Oct 19, 2023

Currently, without credentials you will be limited to 25 builds per day. We are discussing how to allow a community registry for nf-core pipelines.

Would a combination of Wave with Kraken be an option? There are enough academic institutions represented in the nf-core community that could sponsor a server/mirror or two, so we could host the images in a peer-to-peer network after they were build with Wave? Unfortunately, this does seemingly not support Singularity images...

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-ffbf83a6b0ab6ec567a336cf349b80637135bca3:c84c7c55c45af231883d9ff4fe706ac44c479c36-0' :
'biocontainers/mulled-v2-ffbf83a6b0ab6ec567a336cf349b80637135bca3:c84c7c55c45af231883d9ff4fe706ac44c479c36-0' }"
'oras://community.wave.seqera.io/library/bowtie_samtools:16f00b34cfc72399' :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will break nf-core download. I'd prefer the https links if possible, and should test the download if we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a Seqera feedback link for the https links?

modules/nf-core/bowtie/align/environment.yml Outdated Show resolved Hide resolved
@@ -3,9 +3,10 @@ process BOWTIE_ALIGN {
label 'process_high'

conda "${moduleDir}/environment.yml"
container ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental?

Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add in ARM builds as well (may need to tolerate + log failures).

Also, is it possible to parallelise the different requests in separate jobs? Would go 4x faster then.

Comment on lines +54 to +60
- name: Create a registry name
uses: actions/github-script@v7
id: registry-name
with:
result-encoding: string
script: |
return '${{ matrix.files }}'.replace('modules/nf-core/', '').replace('/environment.yml', '').replace('/', '_');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to have name in the conda environment, right? We can remove if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was to have a different wave repo per module. We can keep this for anything built by a Dockerfile for now(bclconvert, cellranger, etc.)

@ewels
Copy link
Member

ewels commented Sep 18, 2024

It'd be nice to also write the container URIs to the meta.yml file. No problem for it to be duplicated for now, means we can just remove from the main.nf file later. See also #6659 (comment)

Note that we haven't decided on a spec for how this should look in the meta YAML file yet, so need to do that. Should be easy to agree though, I have a rough idea in my head already.

@edmundmiller
Copy link
Contributor Author

edmundmiller commented Sep 20, 2024

We should add in ARM builds as well (may need to tolerate + log failures).

Sounds good! We can log it in the CI definitely, or maybe we output it to files like bioconda does.

Also, is it possible to parallelise the different requests in separate jobs? Would go 4x faster then.

Already is! It just skips the step depending on the matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

Build "mulled" containers with wave
10 participants