Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeOps Block Sparsity: throwing CUDA error #350

Open
kvndhrty opened this issue Dec 13, 2023 · 3 comments
Open

KeOps Block Sparsity: throwing CUDA error #350

kvndhrty opened this issue Dec 13, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@kvndhrty
Copy link

kvndhrty commented Dec 13, 2023

Hi KeOps team,

I'm moving our PyTorch layer QuadConv to keops to save memory overhead and boost speed. Thus far everything has been fairly smooth, and we see big speed increases on the CPU.

When I move this code to the GPU it throws a inscrutable CUDA error at runtime, but only if I use the ranges field of the LazyTensor before I perform the reduction. Otherwise, without the block sparsity feature this runs smoothly on the GPU.

from pykeops.torch import LazyTensor
from pykeops.torch.cluster import grid_cluster, from_matrix, sort_clusters, cluster_ranges_centroids
import torch

## DEFINE VARS AND FUNCTIONS
def apply_lazy_mlp(x, lazy_list):
    """
    Applies a list of lazy tensors to x
    """
    out = x
    for lazy in lazy_list:
        out = lazy.matvecmult(out).sin()
    return out

B, N, M, I, J = 8, 2500, 2500, 4, 16  # Batch, Sample size, Sample Size, Channel in, Channel out

x = torch.randn(N, 2).cuda()
y = torch.randn(M, 2).cuda()
f = torch.randn(B, N, I).cuda()
rho = torch.randn(1, N, 1).cuda()


L1 = LazyTensor(torch.randn(16).cuda())
L2 = LazyTensor(torch.randn(64).cuda())
L3 = LazyTensor(torch.randn(8*I*J).cuda())

lazy_list = [L1, L2, L3]

## MAKE RANGES

eps = 0.1  # Size of our square bins

alpha = 1/100

x_labels = grid_cluster(x, eps)  # class labels
y_labels = grid_cluster(y, eps)  # class labels

x_ranges, x_centroids, _ = cluster_ranges_centroids(x, x_labels)
y_ranges, y_centroids, _ = cluster_ranges_centroids(y, y_labels)

x, x_labels = sort_clusters(x, x_labels)
y, y_labels = sort_clusters(y, y_labels)
    
keep = (((y_centroids[None, :, :]-x_centroids[:, None, :])**2).sum(-1) - 1 / alpha) < 0

ranges_ij = from_matrix(x_ranges, y_ranges, keep)


## RUN KEOPS CODE
x_i = LazyTensor(x.view(N, 1, 2))
y_j = LazyTensor(y.view(1, M, 2))

f_i = LazyTensor(f.view(B, N, 1, I))

ex_tensor = LazyTensor(torch.ones(1, 1, 1, J).cuda())

ex_f_i = f_i.keops_kron(ex_tensor, [I], [J])

rho_i = LazyTensor(rho.view(N, 1, 1))

G_ij = apply_lazy_mlp(x_i - y_j, lazy_list) * rho_i * ex_f_i

#G_ij.ranges = ranges_ij

out = G_ij.sum_reduction(axis=1).reshape(B, M, I, J).sum(dim=-2)

Commenting in the second to last line makes this code die on my machine.

You can strip down G_ij to something like:

G_ij = apply_lazy_mlp(x_i - y_j, lazy_list)

and it will run with the ranges argument, so the issue seems to be the shape of tensor G_ij and its interaction with ranges but this error only occurs on CUDA.

Any advice?

@jeanfeydy jeanfeydy self-assigned this Dec 15, 2023
@kvndhrty
Copy link
Author

@bcharlier suggested I install via github to see if that solved my problem. I was running with the tagged 2.1.2 version of keops, installed via pip, but updated to commit dcec73c and my results are the same as before.

@bcharlier
Copy link
Member

Hi @kvndhrty , the problems come from the kron product... I will test further how to fix that

@bcharlier
Copy link
Member

after some investigations, this is a NotImplemented Error: This is not possible to use ranges and batch dimension at the same time...

@bcharlier bcharlier added the enhancement New feature or request label Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants