GitHub - yaak-ai/rbyte: Multimodal datasets for spatial intelligence

rbyte provides a PyTorch Dataset with tensorclass samples built from multimodal data

Installation

uv add https://github.com/yaak-ai/rbyte/releases/latest/download/rbyte-X.Y.Z-py3-none-any.whl [--extra mcap] [--extra jpeg] [--extra visualize]

Examples

See examples/config_templates (ytt templates) and justfile for usage examples.

nuScenes x mcap

Setup a new project with uv:

uv init nuscenes_mcap
cd nuscenes_mcap

uv add hydra-core omegaconf
uv add https://github.com/yaak-ai/rbyte/releases/latest/download/rbyte-0.3.0-py3-none-any.whl --extra mcap --extra jpeg --extra visualize

mkdir data

Follow the guide at foxglove/nuscenes2mcap and move the resulting .mcap files under data/. In this example we're using a subset of topics from NuScenes-v1.0-mini-scene-0103.mcap:

mcap info data/NuScenes-v1.0-mini-scene-0103.mcap
library:   nuscenes2mcap
profile:
messages:  34764
duration:  19.443428s
start:     2018-08-01T21:26:43.504799+02:00 (1533151603.504799000)
end:       2018-08-01T21:27:02.948227+02:00 (1533151622.948227000)
compression:
        lz4: [629/629 chunks] [753.36 MiB/481.51 MiB (36.09%)] [24.76 MiB/sec]
channels:
        (1)  /imu                                     1933 msgs (99.42 Hz)     : IMU [jsonschema]
        (2)  /odom                                     968 msgs (49.79 Hz)     : Pose [jsonschema]
        (3)  /map                                        1 msgs                : foxglove.Grid [protobuf]
        (4)  /semantic_map                               1 msgs                : foxglove.SceneUpdate [protobuf]
        (5)  /tf                                      3103 msgs (159.59 Hz)    : foxglove.FrameTransform [protobuf]
        (6)  /drivable_area                             40 msgs (2.06 Hz)      : foxglove.Grid [protobuf]
        (7)  /RADAR_FRONT                              266 msgs (13.68 Hz)     : foxglove.PointCloud [protobuf]
        (8)  /RADAR_FRONT_LEFT                         258 msgs (13.27 Hz)     : foxglove.PointCloud [protobuf]
        (9)  /RADAR_FRONT_RIGHT                        259 msgs (13.32 Hz)     : foxglove.PointCloud [protobuf]
        (10) /RADAR_BACK_LEFT                          252 msgs (12.96 Hz)     : foxglove.PointCloud [protobuf]
        (11) /RADAR_BACK_RIGHT                         255 msgs (13.11 Hz)     : foxglove.PointCloud [protobuf]
        (12) /LIDAR_TOP                                389 msgs (20.01 Hz)     : foxglove.PointCloud [protobuf]
        (13) /CAM_FRONT/image_rect_compressed          229 msgs (11.78 Hz)     : foxglove.CompressedImage [protobuf]
        (14) /CAM_FRONT/camera_info                    229 msgs (11.78 Hz)     : foxglove.CameraCalibration [protobuf]
        (15) /CAM_FRONT/lidar                          229 msgs (11.78 Hz)     : foxglove.ImageAnnotations [protobuf]
        (16) /CAM_FRONT/annotations                     40 msgs (2.06 Hz)      : foxglove.ImageAnnotations [protobuf]
        (17) /CAM_FRONT_RIGHT/image_rect_compressed    233 msgs (11.98 Hz)     : foxglove.CompressedImage [protobuf]
        (18) /CAM_FRONT_RIGHT/camera_info              233 msgs (11.98 Hz)     : foxglove.CameraCalibration [protobuf]
        (19) /CAM_FRONT_RIGHT/lidar                    233 msgs (11.98 Hz)     : foxglove.ImageAnnotations [protobuf]
        (20) /CAM_FRONT_RIGHT/annotations               40 msgs (2.06 Hz)      : foxglove.ImageAnnotations [protobuf]
        (21) /CAM_BACK_RIGHT/image_rect_compressed     234 msgs (12.03 Hz)     : foxglove.CompressedImage [protobuf]
        (22) /CAM_BACK_RIGHT/camera_info               234 msgs (12.03 Hz)     : foxglove.CameraCalibration [protobuf]
        (23) /CAM_BACK_RIGHT/lidar                     234 msgs (12.03 Hz)     : foxglove.ImageAnnotations [protobuf]
        (24) /CAM_BACK_RIGHT/annotations                40 msgs (2.06 Hz)      : foxglove.ImageAnnotations [protobuf]
        (25) /CAM_BACK/image_rect_compressed           229 msgs (11.78 Hz)     : foxglove.CompressedImage [protobuf]
        (26) /CAM_BACK/camera_info                     229 msgs (11.78 Hz)     : foxglove.CameraCalibration [protobuf]
        (27) /CAM_BACK/lidar                           229 msgs (11.78 Hz)     : foxglove.ImageAnnotations [protobuf]
        (28) /CAM_BACK/annotations                      40 msgs (2.06 Hz)      : foxglove.ImageAnnotations [protobuf]
        (29) /CAM_BACK_LEFT/image_rect_compressed      228 msgs (11.73 Hz)     : foxglove.CompressedImage [protobuf]
        (30) /CAM_BACK_LEFT/camera_info                228 msgs (11.73 Hz)     : foxglove.CameraCalibration [protobuf]
        (31) /CAM_BACK_LEFT/lidar                      228 msgs (11.73 Hz)     : foxglove.ImageAnnotations [protobuf]
        (32) /CAM_BACK_LEFT/annotations                 40 msgs (2.06 Hz)      : foxglove.ImageAnnotations [protobuf]
        (33) /CAM_FRONT_LEFT/image_rect_compressed     231 msgs (11.88 Hz)     : foxglove.CompressedImage [protobuf]
        (34) /CAM_FRONT_LEFT/camera_info               231 msgs (11.88 Hz)     : foxglove.CameraCalibration [protobuf]
        (35) /CAM_FRONT_LEFT/lidar                     231 msgs (11.88 Hz)     : foxglove.ImageAnnotations [protobuf]
        (36) /CAM_FRONT_LEFT/annotations                40 msgs (2.06 Hz)      : foxglove.ImageAnnotations [protobuf]
        (37) /pose                                      40 msgs (2.06 Hz)      : foxglove.PoseInFrame [protobuf]
        (38) /gps                                       40 msgs (2.06 Hz)      : foxglove.LocationFix [protobuf]
        (39) /markers/annotations                       40 msgs (2.06 Hz)      : foxglove.SceneUpdate [protobuf]
        (40) /markers/car                               40 msgs (2.06 Hz)      : foxglove.SceneUpdate [protobuf]
        (41) /diagnostics                            22487 msgs (1156.53 Hz)   : diagnostic_msgs/DiagnosticArray [ros1msg]
attachments: 0
metadata: 1

Create a config.yaml to extract frames from three cameras + velocity, aligning everything to the first camera's timestamp:

---
dataloader:
  _target_: torch.utils.data.DataLoader
  dataset: ${dataset}
  batch_size: 1
  collate_fn:
    _target_: rbyte.utils.dataloader.collate_identity
    _partial_: true

dataset:
  _target_: rbyte.Dataset
  _convert_: all
  _recursive_: false
  inputs:
    NuScenes-v1.0-mini-scene-0103:
      frame:
        /CAM_FRONT/image_rect_compressed:
          index_column: /CAM_FRONT/image_rect_compressed/_idx_
          reader:
            _target_: rbyte.io.frame.mcap.McapFrameReader
            path: data/NuScenes-v1.0-mini-scene-0103.mcap
            topic: /CAM_FRONT/image_rect_compressed
            decoder_factory: mcap_protobuf.decoder.DecoderFactory
            frame_decoder: ${frame_decoder}

        /CAM_FRONT_LEFT/image_rect_compressed:
          index_column: /CAM_FRONT_LEFT/image_rect_compressed/_idx_
          reader:
            _target_: rbyte.io.frame.mcap.McapFrameReader
            path: data/NuScenes-v1.0-mini-scene-0103.mcap
            topic: /CAM_FRONT_LEFT/image_rect_compressed
            decoder_factory: mcap_protobuf.decoder.DecoderFactory
            frame_decoder: ${frame_decoder}

        /CAM_FRONT_RIGHT/image_rect_compressed:
          index_column: /CAM_FRONT_RIGHT/image_rect_compressed/_idx_
          reader:
            _target_: rbyte.io.frame.mcap.McapFrameReader
            path: data/NuScenes-v1.0-mini-scene-0103.mcap
            topic: /CAM_FRONT_RIGHT/image_rect_compressed
            decoder_factory: mcap_protobuf.decoder.DecoderFactory
            frame_decoder: ${frame_decoder}

      table:
        path: data/NuScenes-v1.0-mini-scene-0103.mcap
        builder:
          _target_: rbyte.io.table.TableBuilder
          _convert_: all
          reader:
            _target_: rbyte.io.table.mcap.McapTableReader
            _recursive_: false
            decoder_factories:
              - mcap_protobuf.decoder.DecoderFactory
              - rbyte.utils.mcap.McapJsonDecoderFactory
            fields:
              /CAM_FRONT/image_rect_compressed:
                _idx_:
                log_time:
                  _target_: polars.Datetime
                  time_unit: ns

              /CAM_FRONT_LEFT/image_rect_compressed:
                _idx_:
                log_time:
                  _target_: polars.Datetime
                  time_unit: ns

              /CAM_FRONT_RIGHT/image_rect_compressed:
                _idx_:
                log_time:
                  _target_: polars.Datetime
                  time_unit: ns

              /odom:
                log_time:
                  _target_: polars.Datetime
                  time_unit: ns
                vel.x: null

          merger:
            _target_: rbyte.io.table.TableAligner
            separator: /
            merge:
              /CAM_FRONT/image_rect_compressed:
                log_time:
                  method: ref

              /CAM_FRONT_LEFT/image_rect_compressed:
                _idx_:
                  method: asof
                  tolerance: 10ms
                  strategy: nearest
                log_time:
                  method: ref

              /CAM_FRONT_RIGHT/image_rect_compressed:
                _idx_:
                  method: asof
                  tolerance: 10ms
                  strategy: nearest
                log_time:
                  method: ref

              /odom:
                log_time:
                  method: ref
                vel.x:
                  method: interp

          filter: |
            `/odom/vel.x` >= 8.6

          cache: !!null

  sample_builder:
    _target_: rbyte.sample.builder.GreedySampleTableBuilder
    index_column: /CAM_FRONT/image_rect_compressed/_idx_

frame_decoder:
  _target_: simplejpeg.decode_jpeg
  _partial_: true
  colorspace: rgb
  fastdct: true
  fastupsample: true

Build a dataloader and print a batch:

from omegaconf import OmegaConf
from hydra.utils import instantiate


config = OmegaConf.load("config.yaml")
dataloader = instantiate(config.dataloader)
batch = next(iter(dataloader))
print(batch)

Inspect the batch:

Batch(
    frame=TensorDict(
        fields={
            /CAM_FRONT/image_rect_compressed: Tensor(shape=torch.Size([1, 1, 900, 1600, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
            /CAM_FRONT_LEFT/image_rect_compressed: Tensor(shape=torch.Size([1, 1, 900, 1600, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
            /CAM_FRONT_RIGHT/image_rect_compressed: Tensor(shape=torch.Size([1, 1, 900, 1600, 3]), device=cpu, dtype=torch.uint8, is_shared=False)},
        batch_size=torch.Size([1]),
        device=None,
        is_shared=False),
    meta=BatchMeta(
        input_id=NonTensorData(data=['NuScenes-v1.0-mini-scene-0103'], batch_size=torch.Size([1]), device=None),
        sample_idx=Tensor(shape=torch.Size([1]), device=cpu, dtype=torch.int64, is_shared=False),
        batch_size=torch.Size([1]),
        device=None,
        is_shared=False),
    table=TensorDict(
        fields={
            /CAM_FRONT/image_rect_compressed/_idx_: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
            /CAM_FRONT/image_rect_compressed/log_time: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
            /CAM_FRONT_LEFT/image_rect_compressed/_idx_: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
            /CAM_FRONT_LEFT/image_rect_compressed/log_time: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
            /CAM_FRONT_RIGHT/image_rect_compressed/_idx_: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
            /CAM_FRONT_RIGHT/image_rect_compressed/log_time: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.int64, is_shared=False),
            /odom/vel.x: Tensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.float64, is_shared=False)},
        batch_size=torch.Size([1]),
        device=None,
        is_shared=False),
    batch_size=torch.Size([1]),
    device=None,
    is_shared=False)

Append a logger to config.yaml:

logger:
  _target_: rbyte.viz.loggers.RerunLogger
  schema:
    frame:
      /CAM_FRONT/image_rect_compressed:
        Image:
          color_model: RGB

      /CAM_FRONT_LEFT/image_rect_compressed:
        Image:
          color_model: RGB

      /CAM_FRONT_RIGHT/image_rect_compressed:
        Image:
          color_model: RGB

    table:
      /CAM_FRONT/image_rect_compressed/log_time: TimeNanosColumn
      /CAM_FRONT/image_rect_compressed/_idx_: TimeSequenceColumn
      /CAM_FRONT_LEFT/image_rect_compressed/_idx_: TimeSequenceColumn
      /CAM_FRONT_RIGHT/image_rect_compressed/_idx_: TimeSequenceColumn
      /odom/vel.x: Scalar

Visualize the dataset:

from omegaconf import OmegaConf
from hydra.utils import instantiate


config = OmegaConf.load("config.yaml")
dataloader = instantiate(config.dataloader)
logger = instantiate(config.logger)

for batch_idx, batch in enumerate(dataloader):
    logger.log(batch_idx, batch)

Development

Install required tools:

uv
just

Clone:

git clone https://github.com/yaak-ai/rbyte

Setup:

just setup

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
examples/config_templates		examples/config_templates
src/rbyte		src/rbyte
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamlfmt.yaml		.yamlfmt.yaml
LICENSE.txt		LICENSE.txt
README.md		README.md
hatch_build.py		hatch_build.py
justfile		justfile
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Examples

Development

About

Releases 3

Packages

Contributors 2

Languages

License

yaak-ai/rbyte

Folders and files

Latest commit

History

Repository files navigation

Installation

Examples

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages