Skip to content

Commit

Permalink
Fix var names
Browse files Browse the repository at this point in the history
  • Loading branch information
lezcano committed Sep 18, 2023
1 parent 3b6d734 commit 8808ab9
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions blogpost/post.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ npts = 10_000_000
X = np.repeat([[5, 5], [10, 10]], [npts, npts], axis=0)
X = X + np.random.randn(*X.shape) # 2 distinct "blobs"
means = np.array([[5, 5], [10, 10]])
pred = get_labels(X, means)
np_pred = get_labels(X, means)
```

Benchmarking this function gives us a baseline of **1.26s** on an AMD 3970X CPU.
Expand All @@ -40,9 +40,11 @@ Compiling this function is now as easy as wrapping it with `torch.compile` and
executing it with the example inputs

```python
import torch

compiled_fn = torch.compile(get_labels)
new_pred = compiled_fn(X, means)
assert np.allclose(prediction, new_pred)
torch_pred = compiled_fn(X, means)
assert np.allclose(np_pred, torch_pred)
```

The compiled function yields a 9x speed-up when running it on 1 core. Even
Expand Down Expand Up @@ -77,7 +79,7 @@ default device to be CUDA
```python
with torch.device("cuda"):
cuda_pred = compiled_fn(X, means)
assert np.allclose(prediction, cuda_pred)
assert np.allclose(np_pred, cuda_pred)
```

By inspecting the generated code via `TORCH_LOGS=output_code`, we see that,
Expand Down

0 comments on commit 8808ab9

Please sign in to comment.