Skip to content

Commit

Permalink
Update use_with_tensorflow.mdx
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq authored Jun 4, 2024
1 parent 73d6dbc commit 0651980
Showing 1 changed file with 15 additions and 4 deletions.
19 changes: 15 additions & 4 deletions docs/source/use_with_tensorflow.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -41,19 +41,30 @@ array([[1, 2],

## N-dimensional arrays

If your dataset consists of N-dimensional arrays, you will see that by default they are considered as nested lists.
In particular, a TensorFlow formatted dataset outputs a `RaggedTensor` instead of a single tensor:
If your dataset consists of N-dimensional arrays, you will see that by default they are considered as the same tensor if the shape is fixed:
Otherwise, a TensorFlow formatted dataset outputs a `RaggedTensor` instead of a single tensor:

```py
>>> from datasets import Dataset
>>> data = [[[1, 2],[3, 4]],[[5, 6],[7, 8]]]
>>> ds = Dataset.from_dict({"data": data})
>>> ds = ds.with_format("tf")
>>> ds[0]
{'data': <tf.RaggedTensor [[1, 2], [3, 4]]>}
{'data': <tf.Tensor: shape=(2, 2), dtype=int64, numpy=
array([[1, 2],
[3, 4]])>}
```

```py
>>> from datasets import Dataset
>>> data = [[[1, 2],[3]],[[4, 5, 6],[7, 8]]] # varying shape
>>> ds = Dataset.from_dict({"data": data})
>>> ds = ds.with_format("torch")
>>> ds[0]
{'data': <tf.RaggedTensor [[1, 2], [3]]>}
```

To get a single tensor, you must explicitly use the [`Array`] feature type and specify the shape of your tensors:
However this logic often requires slow shape comparisions and data copies, to avoid this you must explicitly use the [`Array`] feature type and specify the shape of your tensors:

```py
>>> from datasets import Dataset, Features, Array2D
Expand Down

0 comments on commit 0651980

Please sign in to comment.