Specify columns
when reading files with DocumentDataset
#311
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #180.
All of these will now work:
(1) Pandas and cuDF
read_json
do not support acolumns
parameter, so we read in the entire DataFrame and then remove unwanted columns behind the scenes.(2) Pandas and cuDF
read_parquet
both support acolumns
parameter, so we are able to take advantage of this functionality.(3) Pandas
read_pickle
(there is no cuDFread_pickle
) does not support acolumns
parameter, so we read in the entire DataFrame and then remove unwanted columns behind the scenes.(4) Following cudf.read_json, you can specify
dtype
andprune_columns=True
to only return the columns mentioned in thedtype
argument. Note that Pandas does not supportprune_columns
.