Skip to content

Commit

Permalink
chore: fix typos in docs (#7034)
Browse files Browse the repository at this point in the history
fix typos
  • Loading branch information
hattizai authored and albertvillanova committed Aug 13, 2024
1 parent be43e09 commit af818af
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 5 deletions.
6 changes: 3 additions & 3 deletions docs/source/about_mapstyle_vs_iterable.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ It provides even faster data loading when iterating using a `for` loop by iterat
However as soon as your [`Dataset`] has an indices mapping (via [`Dataset.shuffle`] for example), the speed can become 10x slower.
This is because there is an extra step to get the row index to read using the indices mapping, and most importantly, you aren't reading contiguous chunks of data anymore.
To restore the speed, you'd need to rewrite the entire dataset on your disk again using [`Dataset.flatten_indices`], which removes the indices mapping.
This may take a lot of time depending of the size of your dataset though:
This may take a lot of time depending on the size of your dataset though:

```python
my_dataset[0] # fast
Expand Down Expand Up @@ -211,9 +211,9 @@ To restart the iteration of a map-style dataset, you can simply skip the first e
my_dataset = my_dataset.select(range(start_index, len(dataset)))
```

But if you use a `DataLoader` with a `Sampler`, you should instead save the state of your sampler (you might have write a custom sampler that allows resuming).
But if you use a `DataLoader` with a `Sampler`, you should instead save the state of your sampler (you might have written a custom sampler that allows resuming).

On the other hand, iterable datasets don't provide random access to a specific example inde to resume from. But you can use [`IterableDataset.state_dict`] and [`IterableDataset.load_state_dict`] to resume from a checkpoint instead, similarly to what you can do for models and optimizers:
On the other hand, iterable datasets don't provide random access to a specific example index to resume from. But you can use [`IterableDataset.state_dict`] and [`IterableDataset.load_state_dict`] to resume from a checkpoint instead, similarly to what you can do for models and optimizers:

```python
>>> iterable_dataset = Dataset.from_dict({"a": range(6)}).to_iterable_dataset(num_shards=3)
Expand Down
2 changes: 1 addition & 1 deletion docs/source/dataset_card.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Each dataset should have a dataset card to promote responsible usage and inform
This idea was inspired by the Model Cards proposed by [Mitchell, 2018](https://arxiv.org/abs/1810.03993).
Dataset cards help users understand a dataset's contents, the context for using the dataset, how it was created, and any other considerations a user should be aware of.

Creating a dataset card is easy and can be done in a just a few steps:
Creating a dataset card is easy and can be done in just a few steps:

1. Go to your dataset repository on the [Hub](https://hf.co/new-dataset) and click on **Create Dataset Card** to create a new `README.md` file in your repository.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/faiss_es.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Search index

[FAISS](https://github.com/facebookresearch/faiss) and [Elasticsearch](https://www.elastic.co/elasticsearch/) enables searching for examples in a dataset. This can be useful when you want to retrieve specific examples from a dataset that are relevant to your NLP task. For example, if you are working on a Open Domain Question Answering task, you may want to only return examples that are relevant to answering your question.
[FAISS](https://github.com/facebookresearch/faiss) and [Elasticsearch](https://www.elastic.co/elasticsearch/) enables searching for examples in a dataset. This can be useful when you want to retrieve specific examples from a dataset that are relevant to your NLP task. For example, if you are working on an Open Domain Question Answering task, you may want to only return examples that are relevant to answering your question.

This guide will show you how to build an index for your dataset that will allow you to search it.

Expand Down

0 comments on commit af818af

Please sign in to comment.