You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm moving our PyTorch layer QuadConv to keops to save memory overhead and boost speed. Thus far everything has been fairly smooth, and we see big speed increases on the CPU.
When I move this code to the GPU it throws a inscrutable CUDA error at runtime, but only if I use the ranges field of the LazyTensor before I perform the reduction. Otherwise, without the block sparsity feature this runs smoothly on the GPU.
frompykeops.torchimportLazyTensorfrompykeops.torch.clusterimportgrid_cluster, from_matrix, sort_clusters, cluster_ranges_centroidsimporttorch## DEFINE VARS AND FUNCTIONSdefapply_lazy_mlp(x, lazy_list):
""" Applies a list of lazy tensors to x """out=xforlazyinlazy_list:
out=lazy.matvecmult(out).sin()
returnoutB, N, M, I, J=8, 2500, 2500, 4, 16# Batch, Sample size, Sample Size, Channel in, Channel outx=torch.randn(N, 2).cuda()
y=torch.randn(M, 2).cuda()
f=torch.randn(B, N, I).cuda()
rho=torch.randn(1, N, 1).cuda()
L1=LazyTensor(torch.randn(16).cuda())
L2=LazyTensor(torch.randn(64).cuda())
L3=LazyTensor(torch.randn(8*I*J).cuda())
lazy_list= [L1, L2, L3]
## MAKE RANGESeps=0.1# Size of our square binsalpha=1/100x_labels=grid_cluster(x, eps) # class labelsy_labels=grid_cluster(y, eps) # class labelsx_ranges, x_centroids, _=cluster_ranges_centroids(x, x_labels)
y_ranges, y_centroids, _=cluster_ranges_centroids(y, y_labels)
x, x_labels=sort_clusters(x, x_labels)
y, y_labels=sort_clusters(y, y_labels)
keep= (((y_centroids[None, :, :]-x_centroids[:, None, :])**2).sum(-1) -1/alpha) <0ranges_ij=from_matrix(x_ranges, y_ranges, keep)
## RUN KEOPS CODEx_i=LazyTensor(x.view(N, 1, 2))
y_j=LazyTensor(y.view(1, M, 2))
f_i=LazyTensor(f.view(B, N, 1, I))
ex_tensor=LazyTensor(torch.ones(1, 1, 1, J).cuda())
ex_f_i=f_i.keops_kron(ex_tensor, [I], [J])
rho_i=LazyTensor(rho.view(N, 1, 1))
G_ij=apply_lazy_mlp(x_i-y_j, lazy_list) *rho_i*ex_f_i#G_ij.ranges = ranges_ijout=G_ij.sum_reduction(axis=1).reshape(B, M, I, J).sum(dim=-2)
Commenting in the second to last line makes this code die on my machine.
You can strip down G_ij to something like:
G_ij = apply_lazy_mlp(x_i - y_j, lazy_list)
and it will run with the ranges argument, so the issue seems to be the shape of tensor G_ij and its interaction with ranges but this error only occurs on CUDA.
Any advice?
The text was updated successfully, but these errors were encountered:
@bcharlier suggested I install via github to see if that solved my problem. I was running with the tagged 2.1.2 version of keops, installed via pip, but updated to commit dcec73c and my results are the same as before.
Hi KeOps team,
I'm moving our PyTorch layer QuadConv to keops to save memory overhead and boost speed. Thus far everything has been fairly smooth, and we see big speed increases on the CPU.
When I move this code to the GPU it throws a inscrutable CUDA error at runtime, but only if I use the ranges field of the LazyTensor before I perform the reduction. Otherwise, without the block sparsity feature this runs smoothly on the GPU.
Commenting in the second to last line makes this code die on my machine.
You can strip down G_ij to something like:
and it will run with the ranges argument, so the issue seems to be the shape of tensor G_ij and its interaction with ranges but this error only occurs on CUDA.
Any advice?
The text was updated successfully, but these errors were encountered: