-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate config name and data_files in packaged modules #6915
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
tests/test_load.py
Outdated
@@ -1749,7 +1749,7 @@ def test_resolve_trust_remote_code_future(trust_remote_code, expected): | |||
def test_reload_old_cache_from_2_15(tmp_path: Path): | |||
cache_dir = tmp_path / "test_reload_old_cache_from_2_15" | |||
builder_cache_dir = ( | |||
cache_dir / "polinaeterna___audiofolder_two_configs_in_metadata/v2-374bfde4f55442bc/0.0.0/7896925d64deea5d" | |||
cache_dir / "polinaeterna___audiofolder_two_configs_in_metadata/v2-374bfde4f55442bc/0.0.0/cf191ad706de653e" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhoestq note that the hash now will be different from before this PR. I hope this does not have a negative impact...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah it's not supposed to change if we want to support reloading old cache directories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhoestq then we cannot change the packaged builders code anymore?
I pushed a change that fixes 2.15 cache reloading (I fixed the packaged module hash), feel free to merge if this change is fine for you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhoestq then we cannot change the packaged builders code anymore?
Oh, I saw your fix!
That will make the code more stable against future changes in the package builder files. Thanks.
Something weird happened in GitHub: I just merged this PR to main, See: 5bbbf1b However this PR still appears as Open... If I retry to merge this PR, an error appears: "Merge attempt failed: Merge already in progress" |
This reverts commit 09ebf51.
Show benchmarksPyArrow==8.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|
…6915) * Make configs call super post_init in packaged modules * Update hash in test * Add tests * Add tests for BuilderConfig * Fix syntax * use old hash for 2.15 cache reload --------- Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
Validate the config attributes
name
anddata_files
in packaged modules by making the derived classes call their parent__post_init__
method.Note that their parent
BuilderConfig
validates its attributesname
anddata_files
in its__post_init__
method:datasets/src/datasets/builder.py
Lines 128 to 137 in 60d21ef
This PR makes the derived config classes call their parent
__post_init__
method to validate theirname
anddata_files
attributes.