KeOps Block Sparsity: throwing CUDA error #350

kvndhrty · 2023-12-13T04:56:01Z

Hi KeOps team,

I'm moving our PyTorch layer QuadConv to keops to save memory overhead and boost speed. Thus far everything has been fairly smooth, and we see big speed increases on the CPU.

When I move this code to the GPU it throws a inscrutable CUDA error at runtime, but only if I use the ranges field of the LazyTensor before I perform the reduction. Otherwise, without the block sparsity feature this runs smoothly on the GPU.

from pykeops.torch import LazyTensor
from pykeops.torch.cluster import grid_cluster, from_matrix, sort_clusters, cluster_ranges_centroids
import torch

## DEFINE VARS AND FUNCTIONS
def apply_lazy_mlp(x, lazy_list):
    """
    Applies a list of lazy tensors to x
    """
    out = x
    for lazy in lazy_list:
        out = lazy.matvecmult(out).sin()
    return out

B, N, M, I, J = 8, 2500, 2500, 4, 16  # Batch, Sample size, Sample Size, Channel in, Channel out

x = torch.randn(N, 2).cuda()
y = torch.randn(M, 2).cuda()
f = torch.randn(B, N, I).cuda()
rho = torch.randn(1, N, 1).cuda()


L1 = LazyTensor(torch.randn(16).cuda())
L2 = LazyTensor(torch.randn(64).cuda())
L3 = LazyTensor(torch.randn(8*I*J).cuda())

lazy_list = [L1, L2, L3]

## MAKE RANGES

eps = 0.1  # Size of our square bins

alpha = 1/100

x_labels = grid_cluster(x, eps)  # class labels
y_labels = grid_cluster(y, eps)  # class labels

x_ranges, x_centroids, _ = cluster_ranges_centroids(x, x_labels)
y_ranges, y_centroids, _ = cluster_ranges_centroids(y, y_labels)

x, x_labels = sort_clusters(x, x_labels)
y, y_labels = sort_clusters(y, y_labels)
    
keep = (((y_centroids[None, :, :]-x_centroids[:, None, :])**2).sum(-1) - 1 / alpha) < 0

ranges_ij = from_matrix(x_ranges, y_ranges, keep)


## RUN KEOPS CODE
x_i = LazyTensor(x.view(N, 1, 2))
y_j = LazyTensor(y.view(1, M, 2))

f_i = LazyTensor(f.view(B, N, 1, I))

ex_tensor = LazyTensor(torch.ones(1, 1, 1, J).cuda())

ex_f_i = f_i.keops_kron(ex_tensor, [I], [J])

rho_i = LazyTensor(rho.view(N, 1, 1))

G_ij = apply_lazy_mlp(x_i - y_j, lazy_list) * rho_i * ex_f_i

#G_ij.ranges = ranges_ij

out = G_ij.sum_reduction(axis=1).reshape(B, M, I, J).sum(dim=-2)

Commenting in the second to last line makes this code die on my machine.

You can strip down G_ij to something like:

G_ij = apply_lazy_mlp(x_i - y_j, lazy_list)

and it will run with the ranges argument, so the issue seems to be the shape of tensor G_ij and its interaction with ranges but this error only occurs on CUDA.

Any advice?

The text was updated successfully, but these errors were encountered:

kvndhrty · 2023-12-21T19:20:22Z

@bcharlier suggested I install via github to see if that solved my problem. I was running with the tagged 2.1.2 version of keops, installed via pip, but updated to commit dcec73c and my results are the same as before.

bcharlier · 2024-01-25T15:24:46Z

Hi @kvndhrty , the problems come from the kron product... I will test further how to fix that

bcharlier · 2024-01-25T16:10:45Z

after some investigations, this is a NotImplemented Error: This is not possible to use ranges and batch dimension at the same time...

jeanfeydy self-assigned this Dec 15, 2023

bcharlier added the enhancement New feature or request label Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeOps Block Sparsity: throwing CUDA error #350

KeOps Block Sparsity: throwing CUDA error #350

kvndhrty commented Dec 13, 2023 •

edited by jeanfeydy

Loading

kvndhrty commented Dec 21, 2023

bcharlier commented Jan 25, 2024

bcharlier commented Jan 25, 2024

KeOps Block Sparsity: throwing CUDA error #350

KeOps Block Sparsity: throwing CUDA error #350

Comments

kvndhrty commented Dec 13, 2023 • edited by jeanfeydy Loading

kvndhrty commented Dec 21, 2023

bcharlier commented Jan 25, 2024

bcharlier commented Jan 25, 2024

kvndhrty commented Dec 13, 2023 •

edited by jeanfeydy

Loading