Add `BackendRouter` to handle multiple backends #2353

laggui · 2024-10-09T18:59:03Z

Needs more tests :)

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Changes

Introduces a new BackendRouter responsible for forwarding tensor operations to the appropriate backend, given multiple backends.

This is achieved with the help of the intermediate representation defined for ReprBackend and the tensor/ops descriptions.

Testing

Modified the ag-news-train text classification example to run on cuda + wgpu. Also have a MWE:

use burn::{
    backend::{cuda_jit::CudaDevice, wgpu::WgpuDevice, CudaJit, Wgpu},
    tensor::Tensor,
};
use burn_router::{BackendRouter, ByteBridge, DirectChannel, MultiDevice2};

fn main() {
    type DirectByteChannel<Backends> = DirectChannel<Backends, ByteBridge<Backends>>;

    type DualBackend = BackendRouter<DirectByteChannel<(CudaJit, Wgpu)>>;

    let device2 = WgpuDevice::Cpu;
    let device1 = CudaDevice::new(0);

    // TODO: this is wack.. how to automatically implement From<B1::Device1> for MultiDevice2?
    let multi_device1 = MultiDevice2::Device1(device1);
    let multi_device2 = MultiDevice2::Device2(device2);
    let tensor1 = Tensor::<DualBackend, 1>::from_floats([1.0, 2.0, 3.0, 4.0], &multi_device1);
    let tensor2 = Tensor::<DualBackend, 1>::from_floats([5.0, 6.0, 7.0, 8.0], &multi_device2);

    println!("Tensor 1:\n{tensor1}");
    println!("Tensor 2:\n{tensor2}");

    let tensor1 = tensor1.to_device(&multi_device2);

    let output = tensor1.add(tensor2);

    println!("Result:\n{output}");
}

Tensor 1:
Tensor {
  data:
[1.0, 2.0, 3.0, 4.0],
  shape:  [4],
  device:  Device1(CudaDevice { index: 0 }),
  backend:  "router<direct<(fusion<jit<cuda>>, fusion<jit<wgpu>>)>>",
  kind:  "Float",
  dtype:  "f32",
}
Tensor 2:
Tensor {
  data:
[5.0, 6.0, 7.0, 8.0],
  shape:  [4],
  device:  Device2(Cpu),
  backend:  "router<direct<(fusion<jit<cuda>>, fusion<jit<wgpu>>)>>",
  kind:  "Float",
  dtype:  "f32",
}
Result:
Tensor {
  data:
[6.0, 8.0, 10.0, 12.0],
  shape:  [4],
  device:  Device2(Cpu),
  backend:  "router<direct<(fusion<jit<cuda>>, fusion<jit<wgpu>>)>>",
  kind:  "Float",
  dtype:  "f32",
}

… (WIP)

codecov · 2024-10-09T19:14:03Z

Codecov Report

Attention: Patch coverage is 0.16560% with 4823 lines in your changes missing coverage. Please review.

Project coverage is 81.10%. Comparing base (c98b689) to head (0452d2c).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/burn-router/src/ops/op_float.rs	0.00%	1299 Missing ⚠️
crates/burn-router/src/ops/op_int.rs	0.00%	1064 Missing ⚠️
crates/burn-router/src/runner.rs	0.00%	1001 Missing ⚠️
crates/burn-router/src/ops/op_module.rs	0.00%	768 Missing ⚠️
crates/burn-router/src/ops/op_bool.rs	0.00%	216 Missing ⚠️
crates/burn-router/src/bridge/byte.rs	0.00%	105 Missing ⚠️
crates/burn-router/src/channel/direct.rs	0.00%	99 Missing ⚠️
crates/burn-router/src/tensor.rs	0.00%	80 Missing ⚠️
crates/burn-router/src/ops/op_qfloat.rs	0.00%	54 Missing ⚠️
crates/burn-router/src/client/base.rs	0.00%	44 Missing ⚠️
... and 5 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2353      +/-   ##
==========================================
- Coverage   85.23%   81.10%   -4.14%     
==========================================
  Files         770      782      +12     
  Lines       98976   103525    +4549     
==========================================
- Hits        84358    83959     -399     
- Misses      14618    19566    +4948

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nathanielsimard and others added 22 commits October 9, 2024 14:21

WIP

0f61e67

it compiles

f0f46ef

WIP

a4acc54

Remove const D (w/ rebase) + WIP BackendRouter

458cf64

Add missing types from merge

0aeb0ad

Rework traits, types and add MultiBackendBridge & RunnerClientLocator…

b33fb50

… (WIP)

First draft ByteBridge to_backend(tensor, device)

2dd5405

Refactor into modules

d815569

Add mutex, fix types

8ffbfaf

Remove StreamId and implement ReprBackend for Fusion (WIP)

37b484a

float_add op working (w/o fusion)

0251283

Small cleanup

6ed84b5

Remove comment

afc5cd1

Cleanup

ea092d9

Fix fusion ReprBackend implementation (duhhh)

98e84ac

Add runner ops

ed232a4

More ops

916ec4f

Cleanup

d5a4199

Add name

c1cb364

Update Cargo.lock

af0f694

Undo fusion stream changes to common

3ccba7c

Clippy + cleanup

e8d430f

laggui added 7 commits October 9, 2024 15:18

Fix no-std

b5ddbbc

Deal with unused tensors

46cd852

Clippy baby

75dc5c9

Fix comment

b063b49

Fix tensor handle orphans management

a8a9d75

Implement runner read_tensor for other dtypes

ab384d7

Move backend router to its own crate

0452d2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `BackendRouter` to handle multiple backends #2353

Add `BackendRouter` to handle multiple backends #2353

laggui commented Oct 9, 2024 •

edited

Loading

codecov bot commented Oct 9, 2024 •

edited

Loading

Add BackendRouter to handle multiple backends #2353

Are you sure you want to change the base?

Add BackendRouter to handle multiple backends #2353

Conversation

laggui commented Oct 9, 2024 • edited Loading

Checklist

Related Issues/PRs

Changes

Testing

codecov bot commented Oct 9, 2024 • edited Loading

Codecov Report

Add `BackendRouter` to handle multiple backends #2353

Add `BackendRouter` to handle multiple backends #2353

laggui commented Oct 9, 2024 •

edited

Loading

codecov bot commented Oct 9, 2024 •

edited

Loading