Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automation for Zenodo DOI #365

Open
ewels opened this issue Aug 18, 2019 · 19 comments
Open

Automation for Zenodo DOI #365

ewels opened this issue Aug 18, 2019 · 19 comments

Comments

@ewels
Copy link
Member

ewels commented Aug 18, 2019

Zenodo DOIs are an excellent way to cite nf-core pipelines, especially as they give a specific DOI per version of the pipeline. However, there are two points with the current setup which are quite annoying:

  1. We (one of the nf-core admins) has to manually set up the automated GitHub link for each new pipeline
  2. DOIs are given after a release. This means that the master branch then has to be updated to show the badge for the new DOI after the release is pushed. This changes the commit hash on master so that it no longer matches the release.
    • This is very slightly bad practice as we're no longer exactly the same as the release. But worse, it messes up functionality in nf-core list and elsewhere, which checks commit hashes of local clones to see if the latest release is being run.
    • Also bad - if people properly run the release (with the -r nextflow flag or by manually downloading), the bundled code cannot include any information about the proper DOI for citation. This will become more of an issue as we try to improve the ease of access to this information (see Add citation information following the citation file format #361)

After a very, very quick skim read of the docs, I think that we should be able to solve both of these problems with what seems to be an excellent Zenodo API. I see two approaches:

Approach 1: Fully automate releases

  • We can create new resources for new pipelines: https://developers.zenodo.org/#create
  • We can reserve DOIs before publication. This can be done on the website and in the API (with the prereserve_doi flag), but not with the GitHub linkage.
  • We can then update the code with the new Zenodo badge and any other references to the DOI, commit this, then trigger the GitHub release using the GitHub API.

The downside is this has to be done before the release. This means that we can't use the GitHub release web interface, but instead have to trigger the release programatically somehow. This probably needs a little though as to how to do it nicely. Also, whether it's worth it!

Approach 2: More manual DOI fetching, with lint checks

An alternative to this is that we can go fully the other way, and instead of using the automated linkage, manually pre-reserve the DOI on the Zenodo website before release. This would have to be done by the pipeline authors. We could potentially then get the lint tests to check for this when running with the --release flag to ensure that it happens properly.

Welcome for thoughts and feedback!

Phil

@ewels ewels added command line tools Anything to do with the cli interfaces help wanted linting and removed command line tools Anything to do with the cli interfaces linting labels Aug 18, 2019
@sven1103
Copy link
Member

Just thinking wildy here: why not just providing the top-level DOI that automatically routes to the latest Zenodo DOI version of a project and avoid the hassle?

People can get the correct version DOI from the DOI authority easily, which is Zenodo in our case.

Hit me :D

@maxulysse
Copy link
Member

You mean one like that: https://zenodo.org/badge/latestdoi/54024046
That's the one you can get with the first release with Zenodo.
But there's way to reserve a DOI so it should be usable before and release with it:
cf docs: https://help.zenodo.org/

Yes you can! On the upload page under Basic Information and Digital Object Identifier click the Reserve DOI button. The text field above will display the DOI that your record will have once it is published. This will not register the DOI yet, nor will it publish your record (so you can still update the files). This DOI can be safely used in the record's own content as well as any other separate datasets or papers you might be planning to publish.

@sven1103
Copy link
Member

sven1103 commented Aug 19, 2019

You mean one like that: https://zenodo.org/badge/latestdoi/54024046
That's the one you can get with the first release with Zenodo.

Exactly.

But there's way to reserve a DOI so it should be usable before and release with it:
cf docs: https://help.zenodo.org/

Sure, just raising up the question if adding another level of complexity is really necessary :)

@ewels
Copy link
Member Author

ewels commented Aug 21, 2019

Yes this would definitely be easier, but we just made a bit song and dance in the manuscript about how every release gets its own DOI 😅 I guess with the general one, each release would still get its own release-specific DOI, but it's just a little trickier for people to find it. If it's explicitly in the repo then it can be saved with the results in the upcoming citation file, which I like.

@sven1103
Copy link
Member

sven1103 commented Aug 21, 2019

but we just made a bit song and dance in the manuscript about how every release gets its own DOI

And this will be conserved. Zenodo will always create DOIs for every release. So we are still authentic ;)

but it's just a little trickier for people to find it

Ah well, you click on the link and choose the DOI from the version you used from the right panel in the webpage. The benefit is very little compared to the implementation hassle imho.

If it's explicitly in the repo then it can be saved with the results in the upcoming citation file, which I like.

ok, this is a point for which I don't have a solution yet.

Or we just reserve a DOI everytime we merge to master, but don't publish (DOI does not get live). When the real GitHub release comes, we use this DOI and update the record content via the Zendodo API and finally trigger the publishing via the API as well. This might work.

@apeltzer
Copy link
Member

We really need this - it feels wrong to have to do this manually after doing a release and then manually adding it to the README when doing the very first release 😓

@sven1103
Copy link
Member

@apeltzer I agree, lets push this please first: #319 and agree on a common formal description of the release process. Then lets translate it into GitHub actions. I am happy to write the script to do the Zenodo interaction, I love such stuff.

@apeltzer
Copy link
Member

Agree that we should have this with #319 - although that enforces proper Git Commits everywhere too (though there are plugins for that for Atom / Code / IntelliJ to do that, e.g.: https://github.com/KnisterPeter/vscode-commitizen)

@ewels
Copy link
Member Author

ewels commented Sep 23, 2019

although that enforces proper Git Commits everywhere too

I don't follow.. how come?

@ewels
Copy link
Member Author

ewels commented Sep 23, 2019

Re-reading this now, I wonder if we are causing ourselves trouble and overcomplicating things massively here... Maybe we should just have the general DOI for the pipeline? Then if we develop the separate nf-core cite command, that can always pull the pipeline-specific DOI.

It certainly would be a hell of a lot easier.. 😰 (and less likely to cause problems)

@apeltzer
Copy link
Member

Given how many projects are hitting me at the moment, I tend to agree. Maybe start small first and then make it bigger afterward?

@ewels
Copy link
Member Author

ewels commented Sep 25, 2019

Ok, so let's shelve this and #319 for now then if everyone is happy for that. And let's just start using the base Zenodo DOI everywhere. I guess we should document that somewhere...

@sven1103 are you happy with this? I know that you were getting excited about the automation 😅

@sven1103
Copy link
Member

sven1103 commented Sep 26, 2019

I suggested the base DOI in the first place (#365 (comment)), so of course I am happy with it 😂

@sven1103
Copy link
Member

But thank you @ewels for appreciating my excitement about the implementation :P

@mribeirodantas
Copy link
Member

Maybe this?

https://github.com/gbif/gbif-doi
https://github.com/gbif/datacite-rest-client

Check "Create an identifier in Draft state" in https://support.datacite.org/docs/api-create-dois

"To reserve an identifier in Draft state, you will need to ..."

@jfy133
Copy link
Member

jfy133 commented Sep 23, 2022

GitHub Actions: https://github.com/ivotron/zenodo/

@ewels
Copy link
Member Author

ewels commented Sep 26, 2022

GitHub Actions: https://github.com/ivotron/zenodo/

Looks nice but hasn't been updated in 3 years, which is forever with GitHub Actions. I don't recognise the syntax of the example at all... 👀 Also it doesn't show up in the GitHub Marketplace for actions, so pretty sure it won't work.

There are a few that do though: https://github.com/marketplace?type=actions&query=zenodo

@apeltzer
Copy link
Member

I will check a bit more to figure out what might be most suitable for what we actually want to have...

@FriederikeHanssen
Copy link
Contributor

Just talking with @maxulysse about this, after the Zenodo ID issue (again) from this morning. We used the Zenodo API here directly: https://github.com/nf-core/sarek/blob/master/.github/workflows/upload.py which is working pretty well so far.

We are not trying to reserve a pipeline ID, but just publishing files. If someone has time, maybe yet another angle to investigate if it's worthwhile.

@apeltzer apeltzer removed their assignment Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants