-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error handling in dataset_module_factory
#6959
Better error handling in dataset_module_factory
#6959
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Much clearer now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a test needs being fixed because of the change in the error messages:
FAILED tests/test_load.py::LoadTest::test_load_dataset_from_hub - AssertionError: "at revision '0.0.0'" not found in "Dataset '_dummy' doesn't exist on the Hub or cannot be accessed."
Test should be fixed by ef8f7ce (tested locally). Let's see what CI says 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks !
src/datasets/load.py
Outdated
raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({e.__class__.__name__})") from e | ||
except GatedRepoError as e: | ||
raise DatasetNotFoundError( | ||
f"Dataset '{path}' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/{path} to ask for access." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe mention that the user may have to login ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in 62350f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Show benchmarksPyArrow==8.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|
cc @cakiki who reported it on slack (private link)
This PR updates how errors are handled in
dataset_module_factory
when thedataset_info
cannot be accessed:except ... as e
instead of usingisinstance(e, ...)
DatasetNotFoundError
withfrom e
so that the initial error is explicitly logged in the stacktrace.RepoNotFoundError
/GatedRepoError
/RevisionNotFoundError
cases