You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am quite interested in using LeRobotDataset for large scale training. I am interested to get more context on the options for storing images so I am aware of the implications this might have:
Did you by chance study if the mp4 video compression has any negative effects on the image quality in terms of model performance (or any studies you based your decision on)
I see atm lerobot supports storing images either in .mp4 or .pt, but not in arrow or parquet format as many other HF datasets do. Is there any specific reason you didn't add support for arrow / parquet which also provide memory mapping? Any ideas how pytorch would compare to arrow / parquet when using datasets of 100s of millions of examples?
The text was updated successfully, but these errors were encountered:
We compared png frames versus mp4 video compressed on Pusht and Aloha environments in simulation. We didnt notice lower success rate. You could reproduce this result as we currently support both images and video datasets. For instance:
Any ideas how pytorch would compare to arrow / parquet when using datasets of 100s of millions of examples?
Our current data format use parquet to store the data on hub, then arrow once it is downloaded in the cache (through HF dataset), and HF dataset load arrow data as pytorch tensors. It's fast enough for now. We are still iterating on the format to make it simpler and faster ; and scallable
I am quite interested in using
LeRobotDataset
for large scale training. I am interested to get more context on the options for storing images so I am aware of the implications this might have:.mp4
or.pt
, but not inarrow
orparquet
format as many other HF datasets do. Is there any specific reason you didn't add support forarrow
/parquet
which also provide memory mapping? Any ideas how pytorch would compare toarrow
/parquet
when using datasets of 100s of millions of examples?The text was updated successfully, but these errors were encountered: