-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register .zstd
extension for zstd-compressed files
#7032
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposed fix, @polinaeterna.
Just a naive question: is the .zstd
a valid extension?
I thought the only one was .zst
according to its specification...
@albertvillanova hm I don't know tbh, it's just that "mlfoundations/dclm-baseline-1.0" dataset contains files with this extension and these files seem to be valid |
not sure why CI is failing but seems to be unrelated to this pr? can I merge @lhoestq @albertvillanova ? |
yes you can merge, the CI failure is unrelated (surely an issue with hub-ci) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the read the specs, and the only official extension is ".zst": https://datatracker.ietf.org/doc/html/rfc8878#section-7.1
File extension(s): zst
I don't think we should support wrongly typed extensions, unless they are commonly used.
Does it not make sense to ask people in mlfoundations to fix their file extensions?
ah why not, you could try opening a PR btw there is a channel with them at (internal) https://app.slack.com/client/T1RCG4490/C079AKTV11P if you want to let them know |
@lhoestq, your previous comment was addressed to me or Polina? @polinaeterna let me know if it is OK for you. |
Should we close this PR then? |
For example, https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0 dataset files have
.zstd
extension which is currently ignored (only.zst
is registered).